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PREFACE 



This work Is a detailed empirical examination of 
Professor Patrick Suppes* Ideas about the syntax and 
semantics of natural language. Readers familiar with his 
work will recognise the debt that I owe to Professor 
Suppes* 

Several persons deserve special mention. The 
members of my committee: Julius Moravcslk, John McCarthy, 
and Dov Gabbay; for collecting the ERICA corpus: Arlene 
Noskowlts) for his superb under sti >dlng of computer^ 
sclencet David Levlne; for their assistance In 
statlstlcst Mario Zanottl and Charles Dunbar; for 
edltlngt Dlanne Kanerva and Florence Yager; for reading 
the complete text: Edward Bolton; for the most detailed 
and patient assistance I received: my wife, Nancy Smith. 

I would also like to thank the following good 
people for their assistance at many points and In many 
different ways: Barbara Anderson, Naomi Baron, Marnle 
Beacd, Lee Blaine, Alex Cannara, Phyllis Cole, Clark Crane, 
Kathleen Doyle, Dexter Fletcher, Jameslne Friend, Bet^ 
GauMBon, Adele Goldberg, Penttl Kanerva, Joanne Leslie, 
ByxMy Mancha, Lillian O'Toole, Ron Roberts, Marguerite 
Shaw, Ralner Schulz, Steve Veyer, Robert Winn. 

The entire dissertation was done on the IM6SS 
POP-10 and the Stanford Xl PDP-10, mostly at IMSSS. As a 
result, the format Is somewhat different from- dissertations 
typed on a conventional typewriter. Linear notation Is 
used throughout* Exponentiation Is Indicated by the symbol 
as m 

x*2 

which Is read "x square". References to footnotes occur on 
the line, rather than above, as Is customary. In some 
chapters (especially 6), the format le a bit unusual. 
These Inconveniences are, I believe, offset by the fact 
that performing this research and reporting on it in any 
detail is almost impossible without the computer. 

Partial support for the research presented in this 
dissertation was supplied by the National Science 
Foundation under grant NSF-GJ443X. 



i 



5 



TABLE OP CONTENTS ' 



Section P*9« 

CHAPTER 1 — INTRODUCTION \ 

I« THE EXPERIMENT \ 

II. BACKGROUND — PREVIOUS WORK 2 
III. THE APPROACH TO THE DATA .M«r^,»TmS 

IV. TOWARDS A COMPUTER-PERFORMANCE THEORY OF AMBIGUITY 

10 

V. METHODOLOGY AND ASSUMPTIONS J 3 

VI. CONCLUSIONS 

CHAPTER 2 — THE ERICA CORPUS J J 

I. THE SELECTION OP A CORPUS \* 
II. SUPERFICIAL SYNTACTICAL FEATURES 20 

HI. UTTERANCES: NOTATION AND CONVENTIONS 23 

IV. COMPARISON OF ERICA AND ADULT VOCABULARIES 26 

V. IMITATION OF WORD USAGES 32 
VI. COMPARISON OP THE CORPUS VOCABULARY 

TO THE VOCABULARY OF WRITTEN ENGLISH 34 
VII. DICTIONARY CONSTRUCTION 3^ 
VIII. WORD CLASSIFICATIONS 

IX. GOODNESS-OF-FIT TESTS ON THE ERICA 

AND ADULT DICTIONARIES 51 

CHAPTER 3 — FORMAL DEVELOPMENTS 56 
I. GENERATIVE GRAMMARS 
II. THE RELATION OF GENERATIVE GRAMMARS TO AUTOMATA 

t 62 

III. DERIVATIONS AND TREES J3 
IV. CHOMSKY NORMAL FORM GRAMMARS 

V. LEXICAL SIMPLIFICATION OF CONTEXT-FREE GRAMMARS 

66 

CHAPTER 4 — A GRAMMAR FOR ERICA 70 
I. THE SIMPLE MODEL 'ji 

II. PROBABILITY AND LINOTISTICS 72 
III. MAXIMUM LIKELIHOOD AND ESTIMATIONS 85 

IV. CHI-SQUARE AND GOODNESS OF FIT TESTS 89 
V. GEOMETRIC MODELS FOR CFG 92 

VI. LEXICAL AMBIGUITY AND PROBABILISTIC GRAMMARS 94 

VII. THE GRAMMAR GE1 

VIII. LEXICAL AMBIGUITY IN THE ERICA CORPUS 108 
IX. PROBABILISTIC GRAMMARS AND UTTERANCE LENGTH 128 



ii 



CBAPTBR 5 — SEIUNTICS 131 

X. NETAMATHEMATICAL SYNTAX AMD SBMAMTZCS 131 

II. COHTBXT-FRBB AND NETAMATBEMATZCAL SYNTAX 136 

III. MODEL STRUCTURES AND CP6 150 

IV.. SEMANTICS FOR ERICA 157 

V. SEMANTICS FOR 6B1 170 

CHAPTER 6 — THE SEMANTICS OF ERICA 198 

I. THE SEMANTICS OF THE GRAMMAR 6E1 198 

1. ADJECTIVE PHRASE RULES 200 

2. ADVERBIAL PHRASE RULES 205 

3. QUANTIFIER-ARTICLE RULES 205 

4. ADJECTIVE PHRASE RULES — POSSESSIVE ADJECTIVES 

207 

5. RULES FOR ADJECTIVE-PHRASES NOT PRECEDING 

NOUN PHRASES 209 

6. RULES INTRODUaNG POSSESSIVBS. 213 

7. NOUN-PHRASE RULES 214 

8. VERB-PHRASE RULES 222 

9. RULES FOR NOUN-PHRASES THAT STAND ALONE 254 

10. RULES GENERATING SENTENCES 257 

11. PREPOSITIONAL PHRASE GENERATION 290 

12. SUBJECTS OF SENTENCES 290 

13. UTTERANCE-GENERATING RULES 29? 
II. GRAMMATICAL AND SEMANTICAL AMBIGUITY 302 

III. PROBABILISTIC DISAMBIGUATION 310 

BIBLIOGRAPHY ^13 

INDEX *^ Jl'^ 



(Appendices 1-7 are not included in this report.) 



Hi 



CHAPTER 1 — INTRODUCTION 



I, TUB EXPERIMENT 

My purpo«« m this work !• to add **«lght to the 
proposal that modal- th«o rat ic samantlca of the type first 
proposed by Tarakl (l) la a useful tool for understanding 
the semantics of natural languages. This approach has been 
considered in very sophisticated ways (2); but it is seldom 
that a discussion of model-theoretic semantics has centered 
around a corpus of spoken or wrlttacL English actually 
Gathered under empirically sound conditions (3). ; 

My first aim is to lay out such an experiment. i 
have completed the editing of a series of recordings 

/ 1 

between a 32-month-dld child (Erica by name) and several j 
adults. An extended description of this corpus is given in 
Chapter 2, To. manage this corpus, which runs several 
hundred pages, I have transcribed the text onto the PDP-IC 

(1) Alfred Tarski, "The Concept of Truth in 
Formalized Languages", In I,o^ic^ Semantics, ajid 
Hatamathemat lcs. London, 1955. 

(2) See, for example, the series of; papers by 
Richard Montague, some of which are listed in the 
Bibliography to this work, 

(3) See, for example, the articles by Patrick 
Suppes and Elizabeth Gammon listed in the Bibliography of 
this work* 
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timesharing aystm at the Computer Based Laboratory of the 
Institute for Mathenatlcal studies in ttv« Social Sciences, 
and I have written a number of programs to assist in the 

* 

. analysiSe 

The use of the oomputer Is an essential part of 
this worke In the beginning^ the computer was usei solely 
as a bookkeeper for the detail I could not manatje alone* 
but as the analysis progressed the computer played a 
conceptually more Important role* 

II. BACKGROUND — PREVIOUS WORK 

r 

r 

Set-*theoretlcaI Semantics is a standarJ way o^. 
discussing the meaning of tne for^nal languayls of 
matheroatical logic. The stand^rl body ot results known as 
m odel^'t h eorY leaves little doubt as to the power of this 
method* whereby such historically Important concepts as 
entailment, Inference » truth » tense» and modality are 
opened to scientific examination In a compreaeiisive way. 
The major problem of relating these results to the 
questions surrounding the semantics of natural languages 
Involves the characterization of the syntax of natural 
language In a way that relates It to the proposed 
semantics. 
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A* ENGillSH AS A FORMAL LANGUAGE — MONTAGUE 
Let me briefly review here the Important work of 
Profesaor Richard Montague In connection with the semantics 
of natural languages (4). Montague bases his syntax ot 
English on the notion of grammatical category In a system 
similar to the categorlaX grammars of Polish logicians of 
the 1930's (5). The semantics Is then based on a tensed 
mtenslonal logic — an artificial language designed for the 
perspicacity of its semantics* Montague gives several 
examples of tCnglish sentences, shows their translations 
into his artificial language, and discusses the semantic 
results as related to problems of intension, modality^ and 
quantification, 

Montague raises an important issue with his 
treatment of ^ambiguity. He remarks that a sence^ice can 
have two or more different semantic interpretations, and 
that these interpretations can correspond to alcernative 
informal analyses. Several sentences are offered thac have 
different semantic interpretations corresponding to ae 
dicto and de modalities. An example of this Kind of 

(4) Specifically I will discuss the article: 
Richard Montague, ^'^The Proper Treatment of Quantification 
in Ordinary English**, forthcoming in Aporoacnes to Natural 
L anguage > J. HintlKlca, J. Moravcslk, and P. Suppes, 
Tsdltors)» Dordrecht, Holland. 

(5) Montague cites K. AJudukiewicz, Jegy^i 
Posnanle, Warsaw^ 1960, as a source for nis work* 
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modal ambiguity is th« sentence i 
John seeks a unicorn. 

Implicit in his remarks is the idea tnat ^ competing 
philosophical views can be formally represented by 
alternative semantical interpretaticns. 

More directly relevant to my work» several o£ 

Nontaguefs sentences involve ambiguities resulting from 

1 

other causes than modality. He notes that the sentence: 
^) A woman loves every man. 

can have two meanings » and follows through \y showing that 
his semantics yields both of the following interpretations, 
here symbolized in my awn notation. 

1) ( ? x) [WOMAN(x) A 

( Vy)(MAN(y) -> LOVE(x.y)) ] 

\ 2) ( Vy)LMAN(y) 

(ax)(WOMAN(x) A LOVE(x»y)) ] 
Montague does not reject alternative semantic 
interpretations as being spurious. Unfortunately, 7x6 has 
no theory for handling them either. 

B. PROBABILISTIC GRAMMARS — SUPPES AND GAi4M0.>i 

My work is closely related to the work of Proressor 
Patrick Suppes and his student Dr. Elizabeth Gammon^ so I 



will dlscusti their contributions briefly nerey and In more 
detail m the later chapters. 



In "^Probabilistic Grammars for Natural Lans^uages 



(6)t Suppea assigns probabilities to the pr^uctlon rules 
of a phrase*structure grammar ^ and suggests that such 



The probabilistic program Is meant t;o be 

supplementary rather than competitive with 

tfraditlonal Investigations or grammatical 

structure. The large and subtle llnguisN^ic 
literature on Important features of n^atural 

language syntax constitutes an Important and 

permanent body of material* one objective of 

a probabilistic grammar Is to account for a ni^n 
percentage of a corpus with a relatively simple 
grammar and to Isolate the deviant cases that need 
additional analysis and explanation* At tne 
present tlme^ the main tendency In linguistics Is 
to look at the deviant cases and and not to 
concentrate on a quantitative account of that pare 
of a corpus that can be ^analyzed In relatively 
simple terms. (7) ^, 



Two important motives for Suppes* usa or 
(6) Patrick SuppeSy ''probabilistic Grammars for 



Natural' Lah^liages » Technical Report no* 154» Institute 
for Mathematical Studies In the Social Sciences, Stanford, 
California* 




actual speakers* Suppes explains: 



\ 



\ 



(7) [Suppes-l], pp. 4-5. 



probabilistic grammars are 1) determination of the central 
(syntactic) tendencies, and 2) isolation bf (syntactic) 
problems for further study* These motives are axco central 
In iny worK, but with semantics as the primary' goal. As an 
example of the application of a p i . ^ itic grammar, 
Suppes demonstrates the use of probabilistic grammars xn 
' — the prediction of utterance length (8). 

Suppes uses the noun^phrases from the ADAN^I corpus 
of Roger Brown for the construction of probabilistic 
grammars (9)* However, the ADAM-1 corpus is not 
sufficiently large or protracted for this Xlnd of wortc. 

Dr. Elizabeth Gammon continues the study of 
probabilistic grammars in a later paper (iO) concernia^ tn^ 
language of basal readers. Tne tiiruan of Gammon's worx -lh 
the analysis of instructional materials; however, 1 nave 
benefited from looking at the techniques she uses ror 
classifying words into lexical categories and constructing 
grammars* Gammon also uses categorial grammars (similar to 

(8) Patrick Suppes, ^''semantics of Con t.ext-i? re- 
fragments of Natural Languages'*, Technical Report No. 171, 
IMSSS, Stanford, Calif ornial. See especially pp. 20-28, 

(9) See [Suppes-1] and [Suppes-?]. 

1^10) Elizabeth Macken Gammon, ""a Syntactic Analyses 
of 3ome„ i^irst-<3rade Readers", Technical Report No. 155, 
Institute for Mathematical studies in trte Social sci -ncv/ti, 
Stanford Universitv^. ^ 
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Morttayue'3 syntax), so it is interescing to ^ee the 
:«»lat.lv« merits of generative grammars and categorial 
jinmars. Context-free grammars have the advantage of 
being closer to current notation in linguistics; more 
deeply, context-free grammars allow the . use ^f niore 
parameters than the usual categorial grammars, so I 
consider only the use of context-free grammars. 

Neither Suppea nor Gammon considers in any detail 
the problem^ of classifying wor^s as to graounaticai type, 
although both of tnem assume that this is done prior to the 
analysis. (Editors made the classifications for ADAM-1 ma 
for Gammon's basal readers.) Montague considers only a tcjw 
words ('walics', 'loves', 'ninety', 'temperature') and is not 
concerned with any empirical proolems. I think chat an 
empirical theory such as mine must consider the proolem of 
dealing with several thouaand words in a convenient way, 
particul" -ly for computer implementation, xlence, I nave 
used a dictionary to provide information about the 
grammatical functions that words caa perform. | 
C. SEMANTICS Ot CONTEXT-f REi: LANGUAGES — SUPPES 

In/his more recent worlc (11) Suppes nas become 
primarily, concerned with semantics. In "semantics of 
Context-Free Fragments of Natural Languages", Suppes gives 
a context-free grammar for the noun-phrases in ADAM-1 , and 

(11) Patrick Suppes, "semantics of Context-free 
Fragments of Natural Languages, technical Keporc ^o. 171, 
INSSS« / 
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defines semantic functions on the rules of that grammar* 
Suppes emphaslsea the use of simple semantic functions in 
as ^any cases as possiblSt attempting to isolate remaining 
difficulties* 

In the main^ i have used Suppes' formulations for 
semantics rather than Montague's* (See Cnapter 5 for my 
formulation.) Suppes bases his semantics on a context«-free 
grammar and does not translate his English syntax into some 
artifical language prior to semantic analysis. These are 
advantages to his approach* I believe* / 

In considering alternative semantical functions for 
certain constructions » (mainly the **double noun** 
construction as in the phrases 'Daddy suitcase' and ^Bat>y 
Ursula' ) t Suppes also allows alternative se.iiantic 
interpretations* Unfortunately* these alternative semantic^ 
interpretations do not in Suppes' system necessarily rest 
on alternative syntactic representations (or "'trees*')* as 
was the case in Montague's work* 

There are two main problems involved here* First, 
It is my belief that syntax and setnantics correspond very 
closely* so I would prefer to have a different syntactic 
structure to represent each semantic interpretation* In 
addition* any help that a probabilistic grammar may have in 
s«l«ctln9 between alternative semantic Interpretations is 
obscured by having two or more semantic interpretations 
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arise from one syntactic repreaentation. 

III. THE APPKOACH TO THE DATA 

in the context of previous woric, the purpose of my 
work ia to supply a detailed axamination of a larvje ^orpus 
of data using mainly the methods of Professor Supped, and 
to e»t,end those methods where possible, in the case of 
suppea' work on ADAM-1 , the size of the corpus ana the a^e 
of the child required Suppes co confine his analysis, xn 
the main, to the noun-phrase fragment of ADAM-1 . With th6 
larger ERICA corpus, I have written a more cooipleta 
utterance grammar and semantics. The size of the ERICA 
corpus (over 9,000 child utterances) has made this a l^iryd 
t^sk of cofnputation and data manipulacion. 

wnile Montague's work is not addressed to any 
empirical problems, nevertheless I believe that theoretical 
work siitiilar to his can benefit from empirical work in two 
waya. Firat, there ia a tendency in theoreticar work to oe 
confined to one' a own small 8a:nple of aentencea, arid a^ 
danger of .rror if the only crit^<erion of success is the[ 
force, largely peychological, of a^ew coropeting examples 
and counterexamples. Second, t|tere is the chance tha,t 
theoretically interesting exaroplesi may abound ia empirical 
data. An example of this kind, I believe, is the beginning 
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of an extension of the theory of definite descrlpcions that 
I have given in Chapter 5, based on the uses of tne word 
'the* In ERICA. 

Theories of langu^>;je have been labeled as DeloQ 
competence or performance theories. Admitting this 
termlnolocjyt niy work is decidedly in the performance camp, 
altinough not with any hostility. In fact the two Kinds of 
research are both important. I call the basic approach of 
this work "co.mputer-performance'*. By this I m^an that I a.o 
trying to describe linguistic behaVior with a theory tnat 
is largely implementable on a computer, i am not really 
arguing the relative computational abilities oZ the 
computer and the human mind, or the nature of intelligence 
and how to develop it artificially. Rather* I am usxnj the 
computer as a tool for formulating and testlaj a theory in 
an exact way. 

IV, TOWARDS A COMPUTER-PERFOi^MANCF. TilBORY OF AiMdlvJUl fY 

I am trying to develop a methodology for 

lingui5itics research th^t will allow tne comparison ox 

conflicting philosophical/lin9uistic ^itsws in a 

scientifically acceptable way« building oi the results xn 

th^se areas^ and brlnilnj them into £ocu<9 around a 

performanc-tf theory* ' Because of the pervaslv«^ness o£ 
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ambiguity in any theory of languaya, I nave davoted a good 
part of this work to considerittj how to nandle ambiguity. 

I Identify and distinguish several Klada of 
ambiguity in ERICA (as I refer to the corpus), which are: 

1) Llxical ambiguit y; ambiguity due to multiple 
entrleB In the dictionary (Chapters 2 and 4). 

2) qratu natical ambiguity t ambiguity present 
syntactically In a grammar^ (Chapters 3 and 4). 

3) SeaaoiiS aSJ^iauity: two (or more) "meal^ftnga" 
for an utteranc«d (Cnapter 6). 

I believe that ?nany problems of the semantics of 
natural languages can be characterized as problcrfos of 
ambiguity. I thinR that each utterance in English has only 
a small number of "plausible" semantic interpretations. 
The alternative is, I believe^ to adjui^je thtj hurrtan 
language processing facility as arbitrarily complex and 
inherently anomalous. 

My analysis of the "plauslLjl^" is in t^robabiiisttc 
terms. Given the syntax provided by cne prooabiiistic 
grammar* the obvious extension is to let the probability of 
a semantic interpretation be the probability of the 
syntactic strurture(s) associated with that interpretation. 
(Two or more syntactic repres»3ntations of a seuceac^ o^ay 
have the same semantic iatarpretatiOii, I believe.) 

18 



The use of the prooaoxliatic yr-iwavar in 
tijisciniblguatiny provides an interesting cneck on tnes 
relation of the syntax to tne semaiitics. Wa can as^c, tox: ^ 
syntactic construction tiiat has alternative b^^inantlc 
representations, if the probaDllities associated with those 
interpretation3 correspond to our intuitions about the 
utterances in the corpus using the c'bnstruction. 

I use probabili3tlc qri\:.vc.ar8 tq. di:3ambiguata An two 
t^ays. First, there is in r.r.XCA a *\c<^-ruairi amount of 
ambiguity due to the dlctiotwriry (lexical ^bi^eily). this 
kind of ambiguity is ott«n only a,:parent and iiboulc! t>s 
dlsniissed v;i i:hout turtb^-r c-on^^iaer*>ticn. /u Cna.c*'er 4 I 
discuss several wavs to rc.-iove i^xlcrai ^ ibivaw v. 

most intuitively satisfactory method ia to nhcc^K 
alternative with f^e proU^biiity^ .^^•:coocUy, Ut a 

more detailed discussion Chapter Oj> I dlscu??s th^^ 

graTimatical arr.biguity (ami-iv^uLty jue to the ^ramtnar ratnex 
than the dictionary) remaininy in SIUCA arter al.i. .L^iXical 
ambiguity has been roniovrao ^aod I conduct a carezul 
examination of the succesis ojt prdbabixiatlc disaiiibiguafcion 
on these cases* \ 

Strictly interpreted, these insults mdic .v? .aixao 
5ucce3s* However, what they indicate to rae are the iaany 
ways in which the dictionary and the grammar can oe 
Improved, and tney sugge^ what features are causing tn^. 
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major difficulties. 

V. METHODOLOGY AND ASSUMPTIONS 

Let me summarize the basis of this worK by listing 
what I attempted to do as METHODOLOGY and the 
Justifications as ASSUMPTIONS. 

A. METHODOLOGY 

1) rne data base (Erica's mamory. semantic 
information) is characterized as a set-tneoreticax 
structure (Chapter 5). The lexicon greatly simplifies the 
kinds of things in this structure by classin.^ words as 
nounst verbs, and so on* 

2) The syntax of the child's Sfcjeech is generated by 
a context-free grammar » designed to remove most lexical 
ambiguities by rejecting most alternative interpretations. 
Remaining interpretations should represent genuine 
ambiguities. Further ambiguity is narialed by tne 
probabilistic nature of tne grammar (wnicn selects the 
''most likely** interpretation as a first approximation). 

1) The meaning of an utterance is computed by 
set-theoretic functions into the 'objects' in the data 

ba se • / 

3. ASSUMI^TIONS 
1) The "deep structure** of the semantics likely 
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corresponds to the "surface structure** of the syntax^ at 
least more than supposed* 

2) The understanding of natural language Is a 
phenomenon open to our underscandlag to the point that we 
can simulate it on a computing machine of reasonaioie size. ' 

3) Mui^h language processing is done in a 
syntactical way (albeit in a way that corresponds to the 

'^semantics.) Cprtain semi-^automatic linguistic reflexes are 
' learned in such a way that the full power of tne semantic 

machirary is not needed. 

4) One need . not be concerned that obvious 
simplifications in the analysis (such as my handling of 
quantifierst verbSt adverbs) will so grossly misrepresent 
the problem that the whole enterprise is valueless. Inis 
is more than an article of faith in tnat it corresponds to 
my feeling that speakers commonly simplify the semantic 
structure of concepts in many oruinary contexts. 
Quantifiers^ tend to look like simple adj^ctivesg modal ^- 
concepts such as ^necessity' are ass\imed to be transparentg 

and verbs look like simple 1-place predicates. 

\ 

VI. CONCLUSIONS 

I make the following conclusions from the work 
reported here. Tnese results are readily classed into 
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'empirical^ and ^conceptual' Issues* 

A. EMPIRICAL ISSUES 

1 ) A reasonable probabilistic grammar for EKiCA can 
be constructed* My grammar GE1 recognizes 77 percent ot 
the ERICA corpus^ removes most of tne lexical ambiguity 
D reseat in the corpus, and introduces very little 
grammatical ambiguity* (Cnapters 4 and 6) \ 

2) Further 9 the grammar GE1 can be used to complete 
the process of lexical disambiguation in an impressiVa way 
by selecting the most likely lexical alternative. Tnis 
method is apparently better than the ocher moueis ot 
lexical disambiguation that I suggest* (Cnapter 4) 

3) Semantically, the grammar functions reasonaoly 
well* Many rules are obviously correct* Many of the 
remaining problems can be Ascribed to the need for a 
dictionary that more completely describes the alternative 
uses of words in the corpus, and to subtler rules* (In 
this first pass of the data, I simply usid a dictionary and 
grammar constructed mostly a priori * ) (Cnapters b and 6) 

B. CONCEPTUAL ISSUES 

1) There is a need, philosophically, to study tne 
performance side of linguistic concepts by looxtng at 
corpora of data* (See for exauiiplo the discussion of the 
word 'the* in Chapter 5*) 

2) There is a relation between the syntax of tne 
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formal languages of mathamatlcal lojlc and gdneratlve 
grammars. This relationship provides a practical ana 
conceptual basis for the set- theoretical semantics of 
context->f ree languages. (Chapter 5) 

3) There Is a tradeoff between symbols that denote 
objects and symbols that call upon ' f unctions. Tnls 
tradeoff has Implications, I believe, both to certain 
philosophical disputations and to computer- based semantic 
systems. (Chapter 5) 

4) A useful part of a theory of set- theoretical 
semantics can be the inclusion of one or more contextual 
oafameters . indicating sets of objects currently: under 
consideration in the conversation. 

5) An extended theory of definite descriptions can 
be made, using contextual parameters, that accounts for the 
classical theory as well as the other observed uses of tne 
word 'the'. 

6) The notion of prooability can play a Key role in 
the construction of a semantics/ This can oe effected oy 
probably 11 Stic grammars. 

7) Simple set- theoretical functions are ofcen 
successful in describing the KRICA semantics. I nave no 
single measure of correctness, but rather a detailed 
examination of the syntax rules -and their associated 
semantic functions. 
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CHAPTER 2 — THE ERICA CORPUS 

I. THE SELECTION OF h CORPUS 

Erica iB a little girl. Arlene MosRowlt* of 
Bark^ley collacted recordings of Erica talKing to adults, 
usually to Arlena h«rs«lf or to Erica's mother, but 
occasionally to Erica's father. At the bsginning of tha 
recording In 1969 Erica was 31 months oli^ and she was jJ 
months old at the end. (Erica was born on July 24, 1966. 
Unfortunately, tlie dates of all the recordings are not 
available.) The tapes W2re made in ner family's apartm-snt, 
where the surroundings were familiar to Erica. An afforc 
was made to have normal conversation, and tne impression 
from the transcriptions is that tne awareness of th^ 
recording equipment was forgotten after tne fourtn or fifth 
tape. Most of the, recordings ware of a ona-hour session, 
but some extended over several days, a few minutes each 
day. Miss MoskoWlts began the editing but did not rinish, 
so I cannot vo^ich for the authenticity of tha aata, except 
to say that I have tried to edit the text myself, and tnat 
I alone am responsible for any effect that remaining errors 
may have on my results (1). 

Several reasons persuaded ma that tne speech of a 
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child wad th« appropriate place to look tor tne data for 
this experiment; these reasons are discussed oelow* 

1) There was reafon to believe chac cnildren^s 
speech was syntactically simpler than adult speech* and 
this has proven to be the case. Compared to the adult text 
in the ERICA corpus (giving a nama to the corpus itself )t 
Erica's utteranoes are shorter^ the vocabulary less rich^ 
and the structure is more repetitive. So» if by rA^oyal 
1^ ancruaae we mean spokent informal conversation, the speech 
of a child would be the natural candiiate for a single 
beginning. 

2) I had hoped that Erica's speech would be more 
semantically strai^atforward compared to adult speecn. I 
have no reason to doubt that this assumption is correct. 
Simple semantical functions appear to be successful in an 
encouraging part of Erica's speech. This was not 
surprising ^ to me» since I expect semantical feacur^ of 
language t\> have their syntactic counte»partSy/ Tne 
syntactical simplicity of child speech . thdi^ Bug^jests * 
semantical simplicity. 

3) The developmental asr^ects of language and 
concepts are philosophically intarestingt and it is these 

(l) I would like to thank Barbara Aadersoa, Robert 
Winn, and Florence Yager of the Institute staff for their 
help in typing the ERICA corpus into the PDP^IO computer 
for this analysis. 
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factors that one •louli most expect to find In the 8tudy_^ of 

child language, particularly if the study were wal'l timed 

and protracted 9 covering the first moments of speech well 

inta nursery school. Sinc^ the ERICA corpus was collected 

sporadically and hastily (only two months from the first 

recording to the last), the possibility of studying 

language development in these particular data Is remote* 

Given that we w^nt to looic at the semantics of 

natural language, the question of the selection of a corpus 

Dears some discussion* The aivaatage in selectiaj child 

language is that in it we are seeing something llKa the 

real problems that natural lancjua^e represents, in rougnly" 

the right mixtures. It certainly would impress no one to 

> 

prove that model^theoretic semantics was useful i:or a 
patently artificial lan^juaga, say ALGOL-bO. Moreover > 
esoteric counterexamples to a model-tneoretic approach 
-would not impress me as being reason to abandon th^ 
project. What is needed is a detailed discussion of . some 
genuine data. 

The prici^ paid for this sponcaneity is that th^ 
data base for the meaning of the child's utterances is 
constantly shifting and impossible to separate^ even for ^ 
moment of reflection, from such problem*? as pefc^^tion and 
memory. The child's conversations free-wheel as qjJtlcicly as 
the duration of attention span. The only recourse is to 
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back away from the In^llvldual utterances ani their 

Inscrutable contexts and look for patterns tha^ are more 

-* 

readily studied in classes of utterances. 

' i ' ' 

)i In retrospect, looking at a corpus of , free 
conversation is valuable for ^siting a feel for the kinds 
of gr^immars and semantic functions that are best. The real 
test should be conducted in a situation where the 
discussion can be limited in content. One solution might 
be to organize an experiment where children are encouraged 
to talk about certain fixed subjects, such as facts about 
baseball, or the objects streW^i jibout the inu^'r viewing 
room. Another solution might be to look at spoken or 
written language concerning some precise subject matter 
such as elementary raathennatics, 

II. SUPERFICIAL SYNTACTICAL fEAT'JRES 

The most strildng and permanent feature of tue 
corpus Is Its size. Tnere ar*3 19,b2o utterances in all» 
excluding utterances that were completely unideatlf table 
during transcription, but including utter^nctis taat coald 
be partially understood. I used the symbol 

<xxx> 

to indicate unintelllgibility of all or part of an 

27 
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utter^ncH# Thus, 

Can you <xxx># 

would be Included as an utt^^rance of length three. Using a 
similar notation, 

<n> , <v> , <ad j> 
stand reapectivel/ for noun, verb, a»*d adjacLivy, whan the 
exact word was not identifiable, but trie editor thouyht she 
had good reason for a grammatical classification. a he 
analysis of the length of utterances in this chapter first 
eliminated the utterances that includeci tn« 
uninteiliol^lllty symbol <xxx> since it *ni^ht oe 6tai*Uxnj 
for a whole phraaa that was garbled on the tape. 

Comment « wera l*. eluded occasionally in the texc 
when the editor bell-sved that what she heard on the tape 
was not fully viescri*:^ed by tne utteranc*is ti^ea.selve^i; also 
corments about the situation leading up to th-.i recordirig 
session itself were include. i» Of cours^i, comments wera.mt 
Included in any syntactic stily, and che comment.^ were wOt 
sufficiently regular to* admit any or^anizad us^ i** th^ 
semantic analysis, althougli I l^ave noted the co.timent^ in 
the course of reading th* cor.jus. 

The text was i^raparuJ by tht strai .jO-f orwara 
^ppro«ch of trying to raaKe a consistent and accurate copy 
of « converantlon. ic may be aryued that a special 
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r«presentatlont such as a phonetic system^ would ba isore 
appropriate* I have no reason to raally thlhK so at chis 
timet especially considering the problams that ^devising and 
using such a systamHiould create. Phonetic representations 
were o£ course developed to capture the subtleties of 
sound. While I did not use a phonetic approach* It xs 
clearly desirable from a semantic point of view. For 
exanplet the sentence 

here It Is 

(unpunctuated!) can be either a question* a declaration* or 
an exclamation depending on the emphasis and the raising 
and lowering of the voice; these features ^re lost to my 
analysis* 

A full Implementation of a theory of language on a 
computer would of course Include a system £or recognising 
spoken' Fngllsn and translating It Into some A^lnJ of normal 
form. I assume that this translation would very much 
resemble written English* and It la f6r this reason that I 
defend the way ERICA was adltad. If this assumption falls 
then some different representation of spoken English would 
have to be found. 
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III. UTTERAi^lCES: NOTATION AND COi^VEKTIONS 



The text is divided into utterances. If I were 
pressed to name an oojective criterion for making tne 
division between one utterance and the next I would sug^eec 
time-lag between sounds. However, it is clear from 
listening to the tapes that the editor has follpwea the 
interaction semantically and is trying to unitize the 
speech. * That this is a natural process is indicated by the 
£«ct that the transcription is little different from other 
transcriptions of spoken English. 

The units of speech seem to be rather xlke the 
complete thoughts of classical grammar. However formally 
elusive this idea may be I am drawn to it by looking at 
EniCA and comparing the divisions to what I imagine the 
conversation to have been like as an ir*t^raction. 

Once the transcription is complete it ia easy to 
define the delimitatidn of woras in the utterances. 

Notation: A word is an unbroken 8tifa.ng of the 
characters 

a,b,c, ... z,0,l, ... , 9, <,^, 

occurring in an utterance. Lower and upper case letters 
are considered equivalent. The length of an utterance /is 
the number of words in it. 
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Several characters are taken as having special 
slgnlf Icance. 

1) The apostrophe ' Is a part of words, as In 
possesslves and contractions* In the case of contractions* 
the standard Interpretation Is taken formally In that ife 
treat the contraction as though It were two dictionary 
words* However t a contraction only adds one to the lengtn 
of an utterance* This has the advantage of treating the 
contraction In a way consistent with standard usa^e* Xhe 
price paid Is that I lose a possible correspondenoa between 
syntactical and semantical features of the utterance oy 
having one word stand for perhaps two semantical **unlts*** 

EXAMPLES OF USES OF THE APOSTROPHE 
WORD MEANING 



Erica's the possessive of Erica 

doesn't the contraction of a verb 

and a negating particle 

men's the possesflye of men 



2) The dash - Is a i^rt of words^ as In 
r ing'-a round-- the-* rosy 
which Is counted as one word* 
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3) The question mark ? denotes questions. 

4) Quotes ** (but not single quotes^ which are 
not used due to the ambiguity with the apostropne) indicate 
quotations and u8e*-mentlon distinctions* I am not 
concerned with analyzing the semantics of these. 

In standard English^ punctuation characters (sucn 
ns commas and semicolons) often Indicate phrasing In 
sentences. I have not used these clues In the analysis 
formally, but It could be done by Including punctuation 
characters as symbols generated by the grammar. Obviously 
punctuation is needed as phrase markings at some level in 
the analysis of natural language. Here I simply ignore 
puncctiatlon altogether. 

Of the utterau^ces In tne corpus, ERICA had 8,9^^ 
utterances with a mean length of 3.087 » and AJULr had 
10»695 utterauices with a mean length of 4.830^ excluding 
any utterances tn^t were In part unintelligible, (ihe 
disparity between these numbers and the original counts of 
9,085 and 10,740 reflects the numoer of partly 
ualntelllglble utterances.) h more complete analysis of 
the lengths of utterimces in tne corpus is Included as 
Appendix 1 . 
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IV, COMPARISON OF ERICA AND ADULl' VOCABULARiES 

Using tne familiar type-toKen distinction, tne 
ERICA corpus has 79,770 word tokans and 3,169 cypes. This 
count includes the symbols for unrecogniEed words, such as 
<n> used for a noun and <xxx> used for an 
unclassif iahle word, but does not Include utterances that 
were conpletely unintelllgiblea ERICA (the child's speech 
In the complete FRICA corpus) hns 27,922 toicens and 1 ,853 
types; ADULT (the adults' portion) has 51 ,b4b coKens and 
2,867 types* Appendices 2 and 3 list the, words in ERICA 
and ADUIiT by rank and alphabetical ordering. 

Obviously ERICA and ADULT have different 
vocabularies, and neither one uses all tne words found In 
the other* However, it is of soma lntdre3t to asK how 
different these vocabularies are ar.d to propose measures of 
the difference. A simple test is to asic now many words 
occur in one but not the other. Of the words In ERICA, 301 
types were not represented in ADULT* This comparison ^ives 
a misleading impression of the difference between the two 
vocabularies, since these 301 types account for only 565 
tokens out of the 27 » 922 tokens in the ERICA vocabulary. 
The top 135 words in ERICA are all represented in ADULT, 
and most of the wordf Xn ERICA not found in AOJLT have a 
small frequency, many occurring only once or twice* 
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If we look at the portions of the vocabularies with 
frequencry greater than or equal to 5 we ^et a better 
Impression of the similarity. There are 607 types In the 
ERICA vocabulary with frequency greater than or equal to 5, 
accounting for 25,678 tokens* Out of these, only 14 types, 
for 159 tokens, are not to be found in the AuUi-T 
vocabulary. Tables 1 and 2 summarize these results. Table 
3 lists the words with frequency greater than or equal to 5 
from ERICA not In AOULT at all, and Table 4 lists the words 
found In ADULT ( freq 5) but not found In EkICJV. - (Tne 
string Is read 'greater than or equal to' • Its use 

here reflects the fact that this work Is being composed on 
the PDP-10 computer, and the the use of is standard 

linear notation. ) 
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TABLE 1 

WORDS IN THE ERICA VOCABULARY NOT FOU«D lH THE ADULi' VOCABJLARV 

complete ERICA Vocabulary 



Types 



Tokens 



Size of sample 

Words In FItICA not In 
ADULT 

Percent not foun'i 



\ 



1,853 
301 



27,^22 

565 



lb. 24% 



2.02% 



Portion of EaICA Vocaoulary with Jf'requancy :> 



Slse of sample 

Words In ERICA not In 
ADTJLT 

Percent not found 



607 
14 



2.31X 



25,b/d 
1i>9 



.62% 
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TABLE 2 

WORDS IS THE ADULf VOCABULARY NOT FOUND IN I'hE EkICA VOCHBUL/ikY 

Complete ADULP Vocabulary 



Size of sample 

Words In ADULT not In 
ERICA 

Percent not found 



rypes 
2,867 
1 , 31 5 



45.8755 



roKens 



5.52* 



Portion of ADULT Vocabulary with Frequency >= 5 

/ 



Size of sample 

Words in ADULT not in 
ERICA 

Percent not found 



945 
106 

11.22r$ 



48,485 
1 ,o67 
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TASLE J 

WORriS OCCURRING IN ^KICA VOCABULARY NOi' IN AJULT VOCABULARY 

(Frequency ?mb) 



Freq Word 

34 wanna 

31 yup 

16 looKat 

1 3 monvna 

1 0 praaant 

f aak ah tap yeh 

6 gobbla lumlnum 

5 grapafrulta mouaas aweatla 
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WORDS OCCURRING IN ADULT VOCABULARY NO! IN EkICA VOCABULnt^Y 

(Frequency 5) 



freq word 



84 else ^ 

77 were 

37 things 

30 which 

28 understand 

26 looKs 

23 much 

20 breakfast sure 

18 correct really 

16 yourself 

13 certainly few 

12 building delicious feet real 

11 already envelope song than 

10 behind humm sorry until 

9 count ears instrument minutes page tweet 

8 boom closet ever everybody phone sat caste thougnc 

tired told wa/i 

7 ate basicet best cannot chic<ens each ceed f Ireplacvi 

goodness happens lean lid lie line living meadow 
mind push squares whisper you'll 

6 Chinese comfortable its kitties lake lovely nafil 

once party poor rhyme set toby 

S add aao anythinj apart bedroom aifferent dinosaur 

dolly s fact growing haven't Indians Instruments 
loudly movie names park peck purr puts quite row 
rug sewing special stream television tooch you'vts 
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Some tentative conclusions ere: 

1. The BRIC^ and ADULT vocabularies are similar, . 
especially at the high-frequency ends of tha distributions. 
The bulk of their speech comes from tne 1,552 words that 
are common to both li^ts. Erica draws 97«98 percent of her 
speech from the common vocabulary, and the adults 94«4d 
percent. 

2. The ADULT v^xrabulary is more nearly a superset 
of the ERICA vocamjlary than conversely. This holds 
throughout Tables 1 and 2. For example, only 16.^:4 percent 
of the words in ERICA- do not occur in ADULT, while 45.87 
percent of the words ia ADULT do not occur in EiaCA. 

V. IMITATION Of WORD USAGES 

A reasonable hypothesis about the speech of a child 
is that there is a strong tendency for the child jo use 
words recently used by the an adult. As a simpl*^ test of 
this liypothesis, let a usage of a ^ivdn word be an 
n-imitation occurrence if the word occurs In the 
previous n adult utterances* Table 5 gives tne resulcs^of 
looking for n-lmitations, nal,2,«*.8, on the twenty hours 
of the ERICA corpus. To avoid coufusin*^ tne comparison**, 
no counting was done until ^ adult utterances were found at 
th. beginning of .ach hour. 
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TABLE 5 

ERICA WORD USAGES THAT IMITATE ADULi' WORD U&AGEo 
(FIRST 8 ADULT BTTERANCES IN EAC« HOUk ARE IGWOREJ) 



N 



1 
2 
3 
4 

S 
6 
7 
8 



N-IMITATION 



3424 
4939 
5932 
6729 
7386 
7929 
641S 
8816 



NON-IMITATION 



24498 
22983 
21990 
21193 
20536 
19993 
19507 
19106 



IMITATION 



12. 
17. 
21 . 



24.01 
26.45 



2d. 



26 
69 
24 



40 



30. U 
31 .57 



Word Types 
Word Tokens 
EI(iqA Toksns 
ADULT Tokens 



3,169 (complete corpy^i) 

79,770 (comjplete corpus) 
27,922 
51 ,B48 
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VI • COMPARISON OF THE COnt^US VOCABULAHY 
TO TyE VOCABULARY Of WRITTEN EN3LXSh 

A standarcl computational analysis o£ written 
English texts Is contained In Computational Analy sis o£ 
Present pay Ajnerjcan Engllal) by Kenry Kucera and W. Nelson 
fr^ncls (2)« I want to compare the FKICA vocabulary to the 
vocabulary for the [K-F] corpus of written speecn. There 
were 50^406 types In J , representln.j 19014,232 tokeas. 
The samples comprising the [K^F] were selected to be a 
cross-* section of contemporary American written FnK^lish, 

I have taken the 100 most common words In ERICA, 
looked up their frequencies In [K-Fj , and then ^tsed the 
tK-F] frequencies as the basis for the theoretical 
frequencies of a chl-- square test* I summed up the 
frequencies for the 100 most frev^uent woras In Ef^ICA and 
_4^^rt «nd called these sums the 0BS£RV2D-wUM and tn^r 
EXPECT5D-SUM, respectively. The EXPECTED-r nEwUbi.C^ of a 
given word was then the woni's frequency in L^-i?j 
multiplied by 

OBSERVED-SUM 
KXPECrF.D-SUM 



(2) Brown University Press, 1967. KfsterreJ to ag 

[K-F] . 
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The chl- square contribution of tne ^jiven word was then 
computed by the usual formula 

(OBSERVED-FREQUENCY - EXPECTED-FREQUEwCY)"^ 
!=:XPECTED-FREQUENCy 
The results of this test are in Ta\ble 6. The indication is 
that Erica's peech is rather \di£ferent frosn written 
English, even in terms of high-f reqiuency words. 
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TABLE 6 
G0ODi>.ESS-OF-VjT TEST 
FRFQUENCIES FOR THE FIRST 100 WORDS IN Et^ICA ESTIMATED BY iK-Fj 

(RELATIVIZED) 



RANK 


WORD 


OBScKVED 


REL« EXPECTED 


CHI 2 


1 


you 


31 20 


443*0232 


1 61 75« btibu 


2 


a 


2390 


31 32«e45& 


1 7b« 1 401 


3 


the 


2220 


8098«dbti2 


42b7« J8o4 


4 


1 


21 78 


697.431 3 


31 43* Od22 


5 


that 


1 775 


1 423*4331 


o4«0o44C 


6 


Is 


1 728 


1 361 . 5616 


90.5199 


7 


It 


1716 


11b0,49c5 


242.9182 


8 


what 


1692 


257.2393 


8002.4256 




to 


1439 


3525.4456 


1234.8099 


10 


and 


1206 


3889.8660 


1b51 .7716 


11 


ha 


982 


1 286.6009 


72.1138 


12 


are 


948 


592.2706 


21 3.ob»2 


13 


do 


942 


18J.7616 


31 28.64ti3 


14 


In 


906 


2077 . 224 2 ^ 


1350.5116 


15 


don't 


895 




10425.9840 


16 


no 


888 


296.7420 


11 78.0faOJ 


17 


that's 


883 


25.0768 


29351 .1 390 


♦18 


uh 


830 , 


.8089 


8o230b. 1800 


19 


on 


786 


906.9661 


1 6.63^0 


20 


this 


717 


693.7911 


.776^ 


. 21 


)cnow 


687 


92.0830 


3843.554/ 


♦22 


huh 


675 


.674-1 


674544.o-iu0 


23 


have 


650 


531 .3313' 


2fa.5037 


24 


go 


630 


, 84.3382 


3527. 1042 


25 


there 


599 


^ 367.2536 


146.2379 


26 


your 


590 


124.4402 


1741 .7o61 


27 


we 


572 


/57.66 3 


I2<j.4175 


28 


did 


543 


/l 40. 7536 


1149.5423 




♦ Indicates words 


fiyhat l3«em special' to t;.he LmCA 



corpus* Some of these are not peculiar to Ei^ICA but 
rather are seldom founu In Written Englli^h. 

4 Indicates words that were spelled differently In 
[K-F] than in ERICA. For example, ERICA uses ' 'ok' , 
but the preferred English is 'okay'* 
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OA 


wnac s 




// • 1 433 




5U 




31 O 


/ 3 7 • 44*v 1 




Jl 


wan 


3UO 


, 4 • 7U 




^O 


yes 


Aon 


1 A1 A^ 
/ 1 ^ #41 4 J 






on 


AQ^ 


OAIA 
1 w • JO 


1 3707 . b1 oO 


J** 


AAA 


AAA 


10A.0A21 

1 V*t • w04 1 


1 2 72 .4200 


JD 


one 


A *IQ 


AA*) A1^9 
443 • 0 J 4 4, 




JQ 




AQ 7 
434 


3 3 • / 730 


2QA7 7070 

A7*tf • rW»W 


J / 


Wait A 

it A ^ 

/ V 


AA1 
441 


IVl •! IWl 


1 1 A2« 4601 


aO 


Ain 

4 


1 01 ^ 1 1 A1 
1 wl • 1 1 V 1 


1 064 .7076 


J y 


/• ^ 

vnsy 


A 7Q 
4 40 


*ftO / • r 0 v37 


7.3273 


Art / 


wanv 


AO 7 
4^ A 


44* 3303 


.21 32 


AA / 
•t 1 / 




4U;f 


*V7Uw • 70 J4 


41 2S. OS ^6 


f 


iny 


if 


1 77 R^Q'^ 


27S .0748 




All 




AHA <\<1Q1 
4U4» 377 1 






up 


JOO 


7 <; <\ Afi 
4 3 3 • 40 00 


I^A^671a 


/ A^ 


tor 


"^71 


1 77Q * "^SOA 


644*9097 




t.*4 1 1 
wX XX 


J / u 


3U4 • 3373 


IS 042^ 


A 7 


now 


^ AO 


04 1 • 37 4U 


10*^ 32tiS 

1 W J • J40W 


AO 


ein8 




')Ac: A<\A<; 
303 • 43 43 


2 732o 


il A 
49 


wne& « 


J3U 


1 40* 4043 


•14 7 1 


C A 


„ -put 




Q4 7n 
30 • 7I /U 




^-f D 1 


OK 


J J4 


0 AQ • .A 
4 • 07U4 


AiMCik^ A^7n 


c •> 


ttiose 


J 1 


4 4 A c;qqo 
114* 3904 


lAA ^1704 
3D*§ • 3 / 07 


!> J 


1 C 8 


Jl J 


Af\ ^4 ^4 
4U • / 1 01 


1 04U • 0 wOO 


54 


very 




4 04 "JQ 

1 07» 31 /9 


IAD 4AAC; 
344 • J003 


e c 

5d 


wi tn 


29o 


AO 0 T4 OA 

9o2* 71 34 


A7 0 A 7n^ 
4 / 7 • 0 / UO 


30 


1 4 ^ ^ 1 A 

XX wvxe 


? Q ^ 


4 4 7 Ti'^AA 

1 1 4* V 3wD 


7<37 ^QSI 
4 7 4 • 4 7«J 1 


57 


rxgnt 


A A 

290 


o2« 0*^33 




58 


ixice 


2o 3 


4 T 0 £14 Q "7 

1 /3 • 91 9 f 


00 •4137 


59 


some 


279 


21 o« UUb3 


4 7 DKAW 
1 / •UOHO 


60 


now 


272 


ITT 4 r R y1 

1 / / • 1 1>^4 


CO 7 'J 7 
3U • r / / 3 


Ql 


ufiere s 


20 f 


^ 14* 0 7<^ 3 


AH1 7rtiiS 
433 1 • /u U3 


62 


aoing 


444 


21 • 9/39 


"50. O 4 7A1 
4 4*t 3 • 1 4 03 




^ ^ — - 


441 


0 >l 4 4 0 Qk 

44 1 • 1933 


0002 
• www4 


64 


AC 


237 


/ 23 •Ub9 f 


J40 tf 330/ 




mommy 


236 


%y Mfi 

X^X3 40 


41 4 03 / • 3 4UU 


DO 


malce 


220 


4 rt7 r^AQ 0 
1 U / • U4o4 


4 77 1 l^iA 
1 34 • 1 / 7U 


O / 


Via 


717 
41 / 


03 ^ • / 3 


4mQ . S2d4 


£. e 


aoes 


4 c 

21 D 


03 • JoD4 


1A7 14 J41 
34 4 • 3 1 w 1 






7nft 

4UO 


2ft7 Sfi*^7 

40 4 • 30^ r 


1 9.6862 




Si 


207 


48.53S7 


517.3701 


• 71 


^ who 


207 


303.617^ 


iO.7459 


72 


her 


206 


409.4527 


1 J1 .O^Jb 


73 


look 


202 


51.7938 


408.320:^ 


74 


eat 


2ao 


8.2241 


4471 .973:^ 


75 


waa 


200, 


■ 1323.4072 


9b3.6322 


»76 


daddy 


183 


' .5393 


63094.1140 


77 


say 


162 


67.9500 


131 .42o1 


78 


think 


181 


53.3777 


257.5682 
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1(% 
/9 




174 


1 08 .0009 


39.0707 




nim 


1 74 


487* 9i 88 


201 .9701 




nQ s 


171 


1 o«8d2 


1 409#947ti 




down 


1 DO 


1 zu* oo 


1 O • ^ OuO 




nx 9 


1 oo 




• ^ w Ov 


# ^ Oil 


unun 


1 o3 




1 7 • 400U 




jUSu 


1 




1 O* / 7V J 


oo 


DAoy 


4 

1 oo 


0« ^D07 


AO f O • O w 


O / 


1 At> 


1 D 1 


Q 1 •? 71 A 
D 1 • / / 1 


1 7 W • 1 CO 1 




axun w 


1 DU 


<\A HAIA 
• V/O 9 








1 AC 




dfi 41 79 






1 A3 










1 "tw 


1 3-0777 


1 391 .9927 


92 


you're 


148 


20.3580 


B00.296ti 


93 


house 


147 


79.6795 


5b .o 786 


•94 


looklt 


144 


.4045 


50980.2180 


95 


would 


143 


365.9054 


I35.7i14 


96 


more 


142 


298.7643 


b2.25:>t> 


97 


book 


130 


26.0205 


415.5075 


98 


girl 


128 


29.6607 


326.0412 


•99 


gonna 


128 


2.1571 


7341 .3689 


100 


tape 


128 


4.7188 


3220. d247 



\ 

i \ 
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The only word in thm first 100 words in EAICA not 
occurring in [K-i] at all was the word 'Erica', so accually 
this list goes to ranK 101 from the original list. A 
number of words, especially proper nouns, seem special to 
ERICA, and these words (starred in Table 6) concribute the 
bulk of the enormous chi-square sum of 2,347,036. StrDcin^ 
these special words from the data, and recalculating, 
yields a chi- square sum of 206,000. Tnis is still 
unacceptable, but it indicates that it may be possible to 
J solate some of the differences between written and spoken 
English. For example, some of the difficult cni-square 

contributions in the second run come from the hi^h 

i 

frequencies of contractions in ERICA. The word 'what's* 
contributes about 40,000, and 'that's' contributes some 
31,000 to the 208,000 chi-square for the Si:icond run; these 
two words are the most generous contributors. 

VH^ DICTIONARY COWSTRUCflpi^ 

A conceptually important fact about .the syntactic 
study undertaken in this work is that words were put into 
grammatical categories apart from the cont-ixts in which 
they arose. This differs from the technique used uy 
Elisabeth Gammon in her study of basal readers (3). 
Dr« Gammon looked at each sentence if»dividually^ and gave 

40 
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each sentence a sentence type based on now It ac^peared 
^ that the words functioned In that sentence. Ot coarse » 

given words may wall be used differently from santeaoe to 
sentence, and this occurred In Gammon's work* 

When a word functions differently in different 
sentencest I call the word lexically ambiguous. This 
phenomenon is Illustrated by the sentences: 

1 ) There is snow on the ground 

2) It will snow tomorrow. 

According to the usual 9rammatlcal cate«jdries, the word 
*snow' is a iwun in 1) an^i a verb in 2), 

The real difficulty with classifying tne words 
individually in each sentence » as Gammon didg is that it 
leaves unanalysed the crucial task of now one knows when a 
word 13 performing one syntactic function and not anocher. 
Lexical ambiguity is very widespread if one takes as a 
measure the number of multiple lisuings that words have in 
standard dictionaries. A theory of language must begin to 
account for the ubiquitous ambiguity of natural language in 
some way that makes it more than merely ciresome. 

My partial solution is to create a dictionary for 
ERICA with multiple listings for a good portion of ih^ 
words. In doing so I have not included all of tne 

(3) A syntactic Study of First-Gra de Readers , by 
Elisabeth Macken Ganmont Technical Report No, 155, June 
22, 1970, IMSSS. 

ERIC 
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posslDllltlesy or even all tne ones that are ^^robabiy 
represented In ERICA • To have done so woula nave obscur^a 
the resales. The point Is to Implement in some detail a 
theory of lexical ambiguity » and to show how it mi^hc work 
in many cases, without letting the details become 
burdensome. With 78,000 word occurrences in EtilCA, evejby 
occurrence of every word cannot be examined readily. 

NorATlON : In the dictionary, each word Ls 

associated with a grammatical classification string. This 
string may be one classification; e.g., 'n' stands for 
noun in the dictionary. Or tne classiiicacion string may 
be several classifications separated by commas. 'n,v' 
would be used for a word tnat could be eitn^r a noun or a 
verb. 

Sometimes words (i.e., st;ri*"*gs of wor« characters) 
are cohtractions* The pedestrian view is chat contractions 
are two or more words that have been run togetnar. isor 
example, 'you' is a personal pronoun, and nence has the 
classification 'persp' . Supposing 'have' i^ a vero, 
it would have the classification 'v'. Tne word 
'you've' is the contraction of 'you' with ' riave' and 
has the classification 'persp#v', (me symbol '#' stasias 
for a space in the classification.) Tnis notation merely 
says thac 'you've' is to be thought of as 'you have'. 
The situation is, however, complicated by the face that 
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'teve^ can ba althar a v#rb (''to pcfssesa**) or an auxiliary 
verb and is thus classlt'isd "v^aux^. mis means what 
'you've' can be 1) a personal pronoun foll^iwed oy a vero« 

i 

or 2) a personal pronoun followed by an auxiliary. The 
correct classification is tnerefore 'perap#v»per8p#aux\ 
To illustrate this in a sentence , considers 
You've seen him today • 
Looking at the relevant portion of the diccionaryi 

WORD GRAMMATICAL CLASSIFICATIOii 

him persp 

seen v 

today adv (adv is the symbol for adverb) 

you've perspfvtperspii^aux 

Using a program written for the task, I loo< up the 
classifications and obtain 

1) perspfVy per3p#aux v persp adv 
as the ambi^ quo ug l^ ^^ca j^ form for The ambiguous 

lexical form 1) is shorthand for saying that is eicner 



2) persp V v persp aiv 



or 
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3) perep aux v persp adv 
The strings 2) and 3) are called alternativ^ t^^rmiiidl 
forms for if the lexical form has only one 

alternative form, then I snail call it the term^na^ ^2£9« 



\ 

\ 

\ 

\ 
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The pnraae 'terminal form' thus tafars not to tne original 
utterance but rather to the result of replaclny the Mords 
in the utteranceX by tnelr respective gramnkitical 
claeei£ication8 In the \dictionary. Tne Ganiroon methoa would 
have classed ♦) a.<i 3), thus bypassing the lexical ambiguity 
that allows 2) as an alterriative. 

Or* Gammon has toldXme privately she assumes that 
every utterance has a 3ing.l\^ terminal form, or at least a 
best one given the context of its use. While tais 
assumption is useful, it is unsettling to me to leave the 
determination of tne "best" terminal-form as a part of the 
given upon which a linguistic experiment rests. In 
Chapters 4 and 6 I try to resolve the nacurai amoiguicies 
that arisd from using the same words in differeac ways, so 
to a certain extent I am tryiny to use tnis |isaufnpcio(i. 
' Even so. Gammon's assumption is entirely tv^o simpl^a. Ic 
assumes that ambiguities are only appar^jnt, that an 
pdequate theory would always make a single selection* /^iien 
1 have laid out the necessary formal details, I shall try 
to argue that ambiguity plays a forceful and important role 
in natural language* 

I have tried to ^ive a reasoiiable sample of lexical 
ambiguity in ray dictionary, but I certainly have not been 
as thorough' as the most meager commercially available 
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dictionary. 

VIII. WORD CLASSIif'ICATIONS 

Each word In the ERICA vocabulary has a grammatical 
classification string assoclatad with It^ according to uhe 
convantions described in VII above. Appendix 4 jives the 
dictionary for the com^^lete corpus. 

The same symbols are used for ERICA and the ADU^r 
dictionaries. This is not co say that ail the apeaK<^ra 
necessarily have the same grammar or use lauivjuage In the 
same way. The point is that chey communicate , aaa our best 
hope of understanding how 13 to assume a common lexicon. 

I Include hei'e both the fundamental syntactic 
* categories and the entries tnat Indicated multiple 
classification. Table 7 gives the categories and their 
i intuitive meanings. Table 8 gives the entries as I nave 
them in the dictionary. Table 9 br«^aic3 down che multiple 
classifications into the iiundamental categoriesg councin^ 
for example words that could be used as nouns. Hence tc\H 
numbers i^i Table 9 do not sum up to the tocai number of 
types in BkICA, wnich is 3^1 68. 
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TABLE 7 

FUNDAMENTAL SYHflOLS USED Iki ThR DICTIONARY tOR E^aCA AND TabXiv 

INTUITIVE MEANINGS (♦) 



SYMBOL MEANING A:«D EXPLAWATION 



EXAMPLrJ(S) 



adj common adjectives 

adv adverbs 

aff affirmative words 

art articles 

aux auxiliary veros 

conj conjunctions 

int interjections 

mtadv Interrogative adverbs 

inter Interroijatlve pronouns 

link linking verbs 



mi 80 miscellaneous words that 
defy classifiction 
(examining ;;he contexts 
was unilluminatln<j) 



good 

well softly 
yes uhuh 
a an tne 
have did be 
and but 
bye siarn 
iiow wheii 
who whom 
be 

(and it3 inflections) 
diller shafto 



mod modal v^rbs 

n common nouns 

nag negating words 

padj possessive adjectives 

. made from either common 
or proper nouns 



can cause wanna 
house cat 
no not 

Dear' s erica' s 



♦ Recall that uppercase letters are mapped into 
lowercase* 
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persp personal pronouns 

pn proper nouns 

prep preposition 

pron pronouns other than 

personal and interrogative 

pronadj adjectival form of 
a pronoun 

qu quantifying words 

and cardinal numbers 



i you him 
africa tois * 
except from 
anything someone 

his someoody' 



all ooth 
one two 



y verbs other tnan linking 

roodalt and auxiliary 

<undef> for unintellitjible words 
and phrases 



bake fit 
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TABLE 8 

NUMBER OF WORD TYi'ES CLASSIFIiiD IN VAKIOUS LEXICA^ i-ACLviOKiES 
INCirUDING FJJSDAMSN'fAL Ai-O COMPLiiX SYMdO-S 



SYMBOL 


CORPUS 


ERICA ADw'L 




n 


1462 


87d 


1 337 


V 


651 


i54 


bOI 


adj 


305 


139 


^91 


pn 


1 61 


. 96 


143 


adv 




3j 


81 


Int 


76 


5d 


.4/ 


padJt n#aux,n#lin)c 


72 




54 


n. V 

w 


36 


20 


32 


qUtpron 


34 


27 


33 


padj^ pn#aaxtpn#llnk 


30 


18 




pr^p 


23 


16 


22 


nilsc 


21 






pron 


19 


1 3 


1 b 


mod 


18 


1 7 


1 7 


conj 


1 6 


Q 




persp 


1 6 


1 5 




aff 


15 


' 1 2 


10 


pron adj 


13 


10 


t 4: 


prep 9 adv 


1 0 




1 0 


llnky aux 


Q 


7 




perspfmpd 


8 


5 




modlne^ 


7 


5 - 


7 


persplauxtpersplXlnk 


7 


7 


7. 


v^niod 


7 


b 


b 


pron#auxtPron#lln)c 


6 




:> 


auxineg t llnxineg 


5 


4 


4 


neg 




5 




V, aux 


5 


b 


5 


In tad V 


4 


4 


4 


v#ne9tniod#n0g 


4 


3 


4 


%rt 


3 


3 


J 


inter #aux» Interfl Ink 


3 . 


J 


2 


n^adj 


3 


1 


J 


per 9p#v » par sp#a ux 


3 


/ 0 




qu 


3 


3 


> 


Inter 




2 


2 


mod#per^p 


> I 


2 


2 


prep t con J - ^^-^-^ 




2 


> 


pronlmod 


2 


1 


1 
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<und0f> 2 2 1 

advillnk 1 1 1 

Intadvfnod i i o 

lntadv#llnk 1 0 1 

IntadvfauXylntadvfllnk 1 11 

lnter#mod 1 10 

Intariparsp 1 1 0 

Intariparsp \ 0 1 

rtiodfpron l 1 0 

n^adv 1 11 

padj 111 

parap,* ladj 1 1 1 

proAlaux 1 0 1 

v#p^rap 1 1 o 
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TAdLE 9 

PUNDAMENVAL SYMiiOLS AND CONCATENATIONS IN THE ERICA DiCTIONARV 



SYMBOL . FREQUENCY 

CORPUS ERICA ADOi-i 



n 


1502 


900- 


1 373 


V 


699 


385 




adj 


308 


1 40 


294 


pn 


1 61 


96 


1 A') 


padj 


103 


51 


SO 


adv 


97 




Q 9 


int 


7o 




A V 
*l / 


n#aux 


72 




K J. 


nflink 


72 




K A 


pron 


53 




40 


qu. 


37 


30 


«# w 


prep 


35 


27 

4b / 


^ A 


pnlaux 




f O 


25 


pnlllnk 


30 


1 B 


«3 


*inod 


^3 


23 


23 


misc 


21 


1 1 


1 J 


con J, • 


Ifi 


1 U 


4 a 

IS 


persp . 


18 


1 b 


4 7 

1 f 


aff . 


IS 


1 2 


1 U 


pronadj 


14 


4 4 

1 1 


1 S 


aujc 


13 


12 


13 


mod^nag 


11 


8 


11 


p«r8p#aux 


10 


7 


10 


link 


8 


7 




parsp#mod 


8 


S 


o 


perspflink 


7 


7 


/ 


proniaux 


7 




6 


pronflink 


b 




3 


«ux#neg 


5 


4 


4 


linklnag 


5 


4 


4 


nag 


5 


5 


5 


Intadv 


4 


4 


4 


vfneg 


4 


3 


4 


art 


3 


3 


3 


Interiaux 


3 


3 


. I 


Interfllnk 


3 


3 


-\ 


inter 


3 


I 


3 


persp^v 


3 


0 


3 


intadvllink 


2 


1 


2 


mod#per8p 


2 


2 


2 



no 
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pronlmod 2 11 

<undef> 2 2 1 

advfllnK 1 1 1 

Intadvlmod 1 1 0 

lntadv#aux 1 1 1 

Interlmod 1 1 0 

Interlparsp 1 1 0 

iiiod#pron 1 1 0 

v#persp 1 10 

Totals* 3.509 2,055 3,1d3 . 



* The coUiita In this table represent the number of 
^ords that could taKe a carcaln ^ranmatlcal class 
( fundamental or concatenation) • nence» the sums )are 
greater than the actual nu'nber of Morois In the appropriate 
portion of the corpus. » 



\ 
\ 

\ 
\ 
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lA, GCX)DNESS-Ot-ifir TESTS CN THE EKiwA 
AND ADULT DICTIOi^ARIES 



It is a reasonaole hypochejis chat tha adulc and 
child have similar frequencies of usage of words. Usiny 
the common 1^552 words of the ERICA and hDULI vocabularies, 
I constructed a ,552 coi*tingency table, and found 

that this hypothesis was untenable, Wxth 1,551 degrees of 
freedom, the chi-square was 13,1o9*04o0, whica mast be 
rejected at any reasonable level or significance. 

Waile Erica and the adults do not use xndiviaual 
worls with similar relative frequencies , tney use words 
from the various grammatical categories in similar 
proportions. Thus, while tne words 'do^j' and *cat' may, 
for example, be used more often by Erica tnan by tne 
adults, nouns (any nouns) are used similarly. Table 10 
gives that contingency table, showing a chi-aquare ol 
5 3.7626 for 53 degrees of freedom^ roughly significant to 
501 percent, obtained by ta<inj che observed i-re^ueiiCxes 
from the complete corpus as a predictor or the frequency xn 
the ERICA portion alone. Table 11 snows the sam^ resales 
for predicting the ADULT frequeiiCies from t-ie complete 
corpus. This includes the grammatical classes tnat na^. 
fewer than 5 members, a practice that is usually baa zor n. 
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TABLE 10 

PRBDICTING ERICA LEXICAL CLASSES FROM ADULT LEXICAL CLASSES 



LEXICAL CATEGORY 


Adult 


Erica ■ 


Erica 


Chi- 


ObMrv«d 


Obs«rv«d 


Expactad 


•quara 

■ ** ■ — — — - 


n 


1337 


878 


864a 13 


.22 


V 


601 


354 


388 a 44 


3a05 




291 


1 39 


188a 08 


12a81 


pn 

• 


143 


96 


92 a 42 


a14 


adv 


81 


35 


52a35 


5a75 


Int 


47 


58 


30a 38 


25a 12 


padJ^nlTauxtniXink 


54 


32 


1 34a90 


a24 


••f 


32 


20 


20a68 


a 02 




33 


*y mm 

27 


*%A 

21 a 33 


A e a 
1 a 51 


Dad 1^ Dn#aux^Dn#llnlc 


25 


18 


16.16 


^ A 

a21 




22 


16 


14.22 


a22 


njLac 


13 


11 


8.40 


a 80 




' 15 


13 


9.69 


1.13 


iiod 


17 


17 


10.99 


3.29 


con 1 


16 


8 


10.34 


.53 






15 


9.69 


2.90 


af f 


10 


12 


, 6.46 


4.74 


pronadj 


1 2 


10 


7.76 


.65 


prap^adv 


10 


9 


6.46 


1.00 


parapfaod 


8 


7 


5.17 


.65 


8 


5 


5.17 


.01 


nodfnag 


7 


e 
9 


A O 

4. ?2 


.U9 


parapfauxt parapflink 


7 


7 


4.52 


1.35 


V^liOd 


6 


6 


3.88 


1.16 


pronfauxt pron# link 


5 


5 


3.23 


.97 


auxin a? $ 11 nk# nag 


4 


4 


2.59 


.77 


nag 


5 


5 


3.23 


.97 


v^aux 


5 


5 


3.23 


.97 


Intadv 


4 


4 


2.59 


.77 


v#na9tflK>d#nag 


4 


3 


2.59 


.07 


art 


3 


3 


1.94 


.58 


in tar faux t intarflink 


2 


3 


1.29 


2.26 


n^adj 

par apfVt parapfaux 


3 


1 


• 1.94 


.45 


3 


0 


1.94 


1.94 


qu 


3 


3 


1.94 


.58 


intar 


2 


2 


1.29 


.39 


nodfp^rap 


2 


2 


1.29 


.39 


praptConj 


2 


2 


1.29 


.39 



S3 



pronimod 


1 


1 


.65 


• 19 


<und«f> 


1 


2 


• 65 


2^84 


ftavf link 


1 




• 65 


• 19 




0 




0^00 


1 •OO 


mtaavf linic 


1 




• 65 


• 65 




1 




• 65 


• 19 


Interlmod ^ 


0 




0^00 


uoo 


Intttrfptfsp 


0 




0.00 


KOO 




1 




.65 


• 65 


modfpron 


0 




0.00 


uoo 


n^adv 
padJ 


1 




.65 


• 19 


1 




<6S 


.19 


parap^pronadj 


1 




.65 


• 19 


pronfaux 


1 


0 


.65 


• 65 


vfparap 


0 


1 


0.00 


noo 



obaarvad auit 1 m 2^867 

obaarved fum 2 « I^SSa 

axpactad aun m 1^853*00 
chl-aquara sun m 89*98 



TABLE 11 

PREDICTING ADULT LEXICAL CLASSES PROM ERICA LEXICAL CLASSES 



LEXICAL CATEGORY Adult 




Erica 


Erica 


Chi- 


ObMtv^d 


Ob««rv«d 


Expected 


squar' 


n 


A *V A 

878 


1337 


1336.44 


.00 


mm 

V 


354 


601 


538.84 


7.17 


«aj 


1 39 


291 


211.58 


29.81 


pn 


96 


143 


146.13 


.07 


adv 


35 


81 


53.27 


14.43 


int 


58 


47 


88.28 


19.31 


paGj^ nfaux^nf xinK 


32 


54 


48.71 


.57 




20 


32 


30.44 


.08 


qutpron 

pad pn#auxtpn#lin)c 


27 


33 


41.10 


1 .60 


t8 


25 


27.40 


.21 


pr«p 


1 o 


22 


24.35 


.23 


mlsc 


11 


13 


16.74 


.84 


pron 


13 


15 


19.79 


1.16 


_ — J 

liOd 


17 


17 


25.88 


3.04 


con j 


8 


16 


12.18 


1 .20 


parsp 


15 


15 


22.83 


2.69 


af f 


12 


10 


18.27 


3.74 


p&oiiaoij 


1 U 


12 


15.22 


.68 


p rap 9 adv 


9 


10 


13.70 


1 .00 


linktaux 
parsplDiod 


7 


8 


10.65 


.66 


5 


8 


7.61 


.02 


■odfnag 


5 


7 


7.61 


.05 


p#r spfauxt par 8p# link 


7 


7 


10.65 


1 .25 


v^mod 

pron#auXtpronf link 


6 


6 


9.13 


1.07 


5 


5 


7.61 


.90 


auxin agt link! nag 


4 


4 


6.09 


.72 


nag 


5 


5 


7.61 


.90 


v^aux 


5 


5 


7.61- 


^ .90 


intadv 


4 


4 


6.09 


.72 


vlnagtModlnag 


3 


4 


4.57 


.07 


art 


3 


3 


4.57 


.54 


IntarlauXg Intarf link 


3 


2 


4.57 


1.44 


n,adj 


1 


3 


1.52 


1.43 


p a^ aplv 9 par sp#aux 


0 


3 


0.00 


9.00 


qu 


3 


3 


4.57 


.54 


Intar 


2 


2 


3.04 


.36 


nodlparap 


2 


2 


3.04 


.36 
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pr«p,conJ 




2 


3*04 


• 36 




1 


4 
1 


4 C ^ 

1 #52 


• 16 


<und«f > 




4 
1 




1 •37 


advlllnic 




4 
1 


4 CO 

1 #^2 


• 18 






0 


1 #52 


1 ^52 


lntadv#linlt 


U 


4 
1 


0*00 


1 •00 


lntadv#aux» intadvllinK 




i 
1 


1 ^0 
1 •9a 


• 1 o 


lntar#mod 




0 


1.52 


1 .52 


Int«r#p0r0p 




0 


1.52 


1.52 


Intartpwsp 
■odiprcm 




1 


0.00 


1.00 




0 


1.52 


1.52 


adv 




1 


1.52 


.18 




1 


1.52 


.18 


paraptpronadj 




1 


1.52 


.18 


pronfaux 


0 


1 


0.00 


1.00 


vfparap 


1 


0 


1.52 


1.52 



ob««rv«d sum 1 • 1,942 

ob««rv«d sun 2 « 2,956 

•xp«ct«d sun m 2,909.53 

chl-aquar« sun « 212.14 
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CHAPTER 3 — rORMAL DEVELOPMENTS 

m 

I. GENERATIVE GRAMMARS 

This chapter is devoted to standard concepts and 
results of the theory of generative grammars as well as 
some notatlonal matters. 

Let V be a set of symbols. rhen» V* is the set 
of all finite sequences of elements o£ V^ including the 
empty string^ which is denoted by e • Such finite 
sequences are sometimes called strings . 

V*f denotes V* - { £ }• Small letters a^b^c are 
variables ranging over members of V*. 

A structure 

G M <v,r^s,P> 

is a generative grammar just in case G satisfies cae 
conditions: 

1) V is a finite nonempty set of symix)iS| 
the vocabylary s 

2) T is a nonempty subset of Knowa as the 
terminal vocabulary ; 

Then, let the nonterminal vocalyul^rY VN = V-r. 



ERLC 
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3) S is a distinguished •lament of "Ji*, caileci 
the start symbol t 

4) P, the set of productions or j^uWs . 
is a finite subset of the set V<f X V*. 

Let T-f be the set of all finite non-«^mpty terminal 
strings. Further , if <ayb> e then I write (informally) 

a -> b 

to indicate that this is a production in P. The symbol a 
Is the left-hand side (Ihs) of <a,b> and b is the 
rights-hand side (rhs) of <a,b>. 

If a^b are strings in V*^ then b immediately 
produced from a if and only if there is a subsequence a* 
in a and a subsequence in b such that b is the 

result of substituting b' in a for a\ and sucn that 

a -> b' 

is a rule in P. Tne intuition here is that an imn:ieaiate 
production is what one ootains by raplaciag into some 
string for the left^-hand side of some proaucLion by the 
right-hand side of that production. 



ERLC 
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If a^b are In V* , than b is derivable from a If 
and only If tnere exist 

a1 9 a2f an for some n 

such that 

a (immediately) produces a1 
a1 produces a2 
a2 produces a3 

an produces b« 
The sequence <ata1> (Cal »a2>9 ^^an^D^ is called a 

d erivation of b frqm a. 

As an example of these ideas^ consider the 
follon^ing grammar G that gex^erates a few English 
sentences* 

G m <V,T,S,P> 

where 

I 

V s {s^NP/^/PyN^ AnTtV^a^ thetbc^ytgirly sees^Knows^runs) 

Is {a» the » boy » girl 9 sees » knows 9 runs} ; 
hence » the set VN of non-terminals is 
VN m {S,NP,VPr,N,ART,V}; 
S is tne start symbol (for sentence ) 
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and P contains the rules 
S -> NP VP 
NP -> N 
NP -> ART N 
VP -> V 
VP -> V NP 

N -> boy -> girl 

ART -> a ART -> the 

V -> runs V -> sees V -> knows 

Hence, S produces NP VP. Also, the string 
the boy 

la derivable from the string NP. This reiationmip is 
denoted by 

NP M> the boy 
3 

where G (a reference to the grammar) may be omittevi when 
the grammar 1^; clear* 

The set of noun phrases is the set of all terminal 
strings derivable from the symbol NP. What we are 
Interested In Is the set of terminal strlags in T^ that is 
derivable from the start symbol , i.e*« 

{a € j s a«> a} 

This Is the laaguage of the gramma r G, denoted oy L(G)# 
Usuallyt when I say 'derivation' I mean derivation from the 
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start symbol to a terminal strln(j» it grammars G1 and Gi 
ar# such that L(G1)«L(32), th*dn GI is said co be equivalent 
to G2. ^ 

The following strings are in L(G} : 



/ 



boy runs 

the boy runs 

the boy sees the girl 

%he girl sees the boy 



Notice that the definition of derivation allows 
several sequences chat are derivations for ^ toy runs'. Two 
of them are: 



1) <SpNP vp> <r:p vPpN vp> <n vp^n v> <4m v^Sby 

<iJoy V,boy run8> Va 

2) <SpNP V?> <NP VPpNP V> <N? V,;>i? run8> <NP runSpi^ run3> 

<N runs, boy runs> 



In the above, 1) and 2) differ only in che ordar 
that the rules are applied, and cne/ seem be **one 
derivation in two different orders*" • i^nat is needed Is a 
notion of ""derivation" that selects only oae of thase* Tne 
notion I use is that of a left^most derivation ^ 

A derivation is a lef t*mo3t derivation just xn 
case, in each pair^ of tne sequence, the substitution is 
made for the leftHtnost possible se.]uence ot symools from 
wMsh a substitution could be made. L>iotice tnat 1) is 



concept* The concept of left-most derivation Is not 
readily useful with all iclnis of graroiaars. 

Different Kinds of generative grairanars are obtainea 
by putting restrictions on the production rules that may b^ ^ 
In P. A type-0 o£ recursively enumerable grammar nas no 
further restrictions placed upon It. A tvpe -1 or 
fcontext-s^nslt^ye grammar has only the restrlc^ioi^ that If 
<fk^h> is In P thian !b| >5 iaf^ where ia| Is the number of 
symbols in a^ the length of a, A tvpe-4 or context-free 
grammar is context-sensitive plus if (.a^h? is In P then 
iatsi; further^ only non-terminals may occur on . the 
left hand side of the derlvatloa, | (in facc^ it is 
sometimes the practice to define the classes of cermlnals 
and non-temlnals from the productions In a concext-free 
grammar. Fnis is the way a compiler would handle the 
compilation of a program In, say, ALGOL.) i\ociCd thau che 
above grammar G Is contaxt-f fee. A type- 3 or rec^ular 
grammar Is ooat#*xt-f ree, plus If <a,b> 1^ in then d is 
either of the form 

or of the form 
tN 

where t is a terminal and N is a non«- terminal* lii 

addition, other grammars of various intejrmediate strength3 

G8 
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are possible* 

I am concerned exclusively wiun context-£re^ 
grammars* These graTtmars are eas|lly create^ and parsiny 
pr^j^ams can be easily written for context-f r-^^ v^rainmars* 
(Usually I say 'cfg' for 'Context-free srammar',^ 'c£l'_for 
^context-free lantjuage'* ) Moreover, set-theoreticdl 
senantics applies very naturally to cfg, 

r'or cfg, it can be shown that *f a string aas any 
derivation, then it has a left-most av»;ivatioa, rn« scnst* 
of ''one derivatior* in several different orders" is 
correctly captured by the notion of left-most derivation. 
When I say 'derivation', unless otherwise noted, I mea.i 
* left-most derivation', 

!!• THE RELATION OF GKNERAnVE GixAVtyA/vS fO AUxv^/i^rA 

A conceptually ii^^^ortant fart ir> :.hat th^ r^ilation 
between the theory of generative grammars and tne theory of 
aaiamata is well unJerstood (1), I shall say' tnat an 
automaton recognizes a language if and only if cuts 
automaton, given an input string, qtops and rtfturns a I'RUK 
xf the string is in the language. In particular, re^jular 
languages are represantable by finite auto-nata (an.i 
conver^iely) ; and cont-ext-free languages are ret^r^iasenuabie 



ERLC 



(1) Sea, for, example, Hopcroft and Ullitidn, roraial 
Language!? aaa Their KeXattoa to Automata, Keaainu, Main.? 
1569. " * 
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by push-- down aucom^ta (and conversely) • £very 
context-sensitive lan^juage is recognized by some furiatj 
machine that always halts » so that coritaxt**sen8i tivt^ 
languages are recursive* The converse is' not whe case^ 
however, since there are recursive sets that are not 
context-sensitive l^n juages* Each type-0 language is 
recognized by sane Turing mac^hine^ but the mac line nnay not 

necessarily halt on a string not in the set in question 

i 

(hence the name "recursively jenumerable") . 

\ : 

III* DERIVATIONS Ai^D TRESS 



While the notir qi£ a left**most derivatior; is tae 
formal definition of 'derivation" that I want to use« 
informally the concept ojf a tre e (2) is far superior. I 
take it that the idea of a tree is sufficiently intuitive 

to require no further explanation, except to ^^ive a few 

/ 

exampl'^a. / 

In the above! example of the cfg G| coi^siuer the 



derivation of 'taoy runs'. This can be represeuteu oy th« 

S 



tree 

TIP ^ ^VP 



the boy runs 
(2) See [Suppes-2] for a tree-oritiiit*^d app/oach. 
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Note that each of the (non-left-most) derivations yields 
this same tree. Ic is possible co define the nocion or 
tree and proceed to show tnat, for cfg, tnere is a one-0(ie 
correspondence between 1 aft-most-derivations and trees* 

It may happen that there are two or more left-most 
derivations for a strincj according to a cfvj* Consider the 
grammar G' o tained fro'n G above by aading tne rule 

S -> ARr N VP 
Then, the sentence 

the boy runs 
has two leftmost derivations: 

1) <S,:^P VP> <NP VP, ART N VP^ <ARr 'iS N/P,wne VP> 
<trie N VP, the boy vp> 

<the boy VP, toe boy V> <the boy V,the boy runs> 

2) <S,ART N VP> <AKT N VP, the VP> <tae ci VP, the ooy VP> 
<the boy VP, the boy V> <the boy V,the boy run3> 

Each derivation is represented by a different tree, vis.: 
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When a string has tvo or more (leg^t^moat) derivations, _the 
string is said to be graniroati^aXlv ampiv^uoua ^ A yrammar 
G is gramma cicallv ambiguous i£ and only if some scrlny 
in L(G} is grammatically ambiguous. 

As a nqtational device, partition the set P of 
productions into rule classes such thac all elements of the 
same ^ rule cla^s have the same Ins. Then number 
(arbitrarily) the classes in the partition so^ha^t each Ihs 
has a number 1, and further, gjj^:e..^eercfi^^ule in each class 
a number J. Thusr^^rule is uniquely represented by the 
pair (i,J), called the label of the rule; and all rules 
having tne same Ihs have the same number i, and no two 
rules with i as the first element of the label have the 
same number J as the seconi element of the label. It is 
then possible to denote a derivacion by a sequence of 
labels (assuming that\ we are sfartincj witn the stare symbol 
and that the dr -ivation will be leftmost.) 

If I label the rules in G by unis scheme: 



(1.1) 


s -> np vp 


(2,1) 


np -> n 


(2.2) 


np -> art n 


(3.1), 


vp -> V 


(3,2) 


vp -> V np 


(4.1) 


art -> a 


(4.2) 


art -> the 


(5.1) 


n -> boy 


(5.2) 


n -> girl 


(6.1 ) 


V -> runs 


(6.2) 


V -> sees 


(6,3) 


V -> knows 
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then tne left-most derivation of 'the boy sees the ^Iri' 
nay be represented by the label sequence 

(1,1) (2,2) (4,2) (5,1) (3,2) (b,2) (2,2) (4,2) (5,2) ,. 

iV. CHOMSKY NORMAL tORM GRAMMARS 

If a cfg G Is such that eacn rule in P Is either 
of the Conn 

A a 
or of the form 

A -> B C 

then G is said to be in Chomsky normal form, x Every cfg 
has an equivalent grammar tnat is in Chomsky rwrmal form, 
Moreover, it is possiole, given a Cnomsky normal form 
grammar G' tnat represents a » grammar G, to obtain a 
derivation in G from, a derivation In G' • 

V. LEXICAL oXMPLIFICAriOS OF CONTEXT-FkEE GRAMMARS 

r 

/My syntactic theory for the ERICA corpus Is xiLjaly 
dependent on the use of a dictionary to classify words 
according to grammatical categories. When an utterancts is 
to be parsed by the grammars I use, tne utt.^rance is first- 
converter' to its lexical f orm (which may be 'shorthand* for 
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several alternative forms). The s^rcunmar cnen sees only the 
alternative forros and never sees the original utterance. 
The vocabulary V of the grammar does not contain th*e 
actual words In tne utterance but only symbols for the 
grammatical categories, plus additional symbols. 

1 1 represents a phllosopnical-psycnolo«^lcal 
question as to whether the dictionary exists separately 
from the grammar (as I believe) or as only a snortnand ior 
rules in the grammar. I will discuss this further in 
Chapter 4« 

I shall say tnat G admits of J^exical airoplltlcacion 
just In case: 

1) there is a non-empty subset DP of che 3et 
of rules P such that for each p DP, p is of 

the form 

A -> i 

where A Is a non-termiiial, and d is a ter.Tilnal oi G; 

2) let D = { d I A -> d is in DP }, called 
the set of lexical symbols • Tnen, no d e D occurs in 
any rule In P • DP . 

Many of the grammars useful for natural language 
admit to lexical simplification. 
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rhe gain, computationally, is that a different procedure 

can be used on the set D of symbols tnan the procedure 

used for the grammar as a whole, provided a lar^e number of 
symbols get put into tne class D (3)« 

Clearly cf9 exist that cannot be lexically 
simplified* One such case is che grammar consisting of the 
following productions: 



(1,1) 


S 


-> A 


B 


(1.2) 


s 


-> a 


c 


(1.3) 


s 


-> b 


c 


(1.4) 


s 


-> a 




(1,5) 


s 


-> b 





No non"-empty set DP can be construcced, since tne symbols 
a and b occur in rules (1,2) and (1,3) respectively* 
Hence, adding a lexicon to this grammar is impossible* A 
different grammar for the same langua<je would, peraaps, 
allow a lexicon* Bat the lexicon should not change cne 
structure of derivations ia the language, only simpify 

w 

them* 

The conceptually interesting fact aoout lexical 



(3) Programming languages such as AiGoL-60 o^ ten 
have their syntax defined ia terms of context-free 
grammars* According to such definitions, Oiie would believe 
that the parser for an Ai^OL-oO compiler ran straignu 
through the derivation of the "^projram" during compilation* 
In fact, this is not the case with any actual compiler I am 
familiar with* la practice, coiipllers take advanta:je of 
many things about the language in order to gain greacer 
efficiency* An exaniple is the searcn for numt>ers and 
arithmetic expressions in tne program* This searca is 
customarily implemented by a different routine Lhac Iooks 
especially for expressions, and replaces tnem before ch^ 
actual parser sees them* fais is analogous to naviri ; a 
dictionary system for natural langu^^e* 

ERIC 7r> 
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simplification is chat it can ^raatXy reduce tne 'parsin<j' 
machinery when the ^surface lan^juage has a very large 
vocabulary that can be classified (per naps with great 
overlapping) into a relatively small numoer of "^grammatical 
categories*" • Moreover, if this is happening we have, among 
other things, the baexs for probabilistic theories of 
sentence production, based upon the prooaoility of uttering 
lexical forms rather than actual strings of words* 
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CnAPTER 4 — A GRAMMAR r'OR EaICA 

I. THE SIMPLE MODEL 

There is a straightforward way to generate a 
probability apace from a cfg: assign a non-zero parameter 
to each rule In the grammar and require that toe parameters 
for each class of rules with a given lef t-hand-sile sum to 
1 . It is easy to see that this generates a non-zero 
probability fpr eacn sentence in L(G) , and that tne sum 
of the probabilities over L(G) (possioiy an ^ infinite 
set) is 1. " 

For ex^unple, consider the yraramar G 

G = <V,f,NP,p> , where 

V » i NP,AJJ,ADi',N i 
and , 

T a { ADJ,N } 
and P has the rules 

(1.1) NP -> N 

(1.2) NP -> AJP N 
(2,1) ADP -> ADJ 

, (2,2) ADP ADP ADJ 

(this is a noun-phrase grammar). 
Then L(G) is 'nfinite since rule (^,2) may bo 
applied recursively so that for each n^itural number n, 

A ^ 
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AOP o3> ADJ ... ADJ 

G n times 

(aoffletlmes aenoted ADJ '^n) 

and hence 

NP »=> ADJ "n N 
G 

for each natural niirober n. 

Suppose we assign tha following probabilities to the rules 
In P: 

DISTRIBUTION D TO RULES IN GRAMMAR G 



Rule Probability 



(this Is not unreasonable); 
then the noun-pnrase 

•) ADJ ADJ N 

la parsed by the tree T*: 



(1.1) .6 

(1.2) .4 

(2.1) .7 

(2.2) .3 



NP 



ADP W 




AD? ADJ 



ADJ 

I nhall say that the conditional probability of applying 
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rule (if J) givsn that soma rula in tae i-class is to oe 
applied is the parameter associatea wlch (i|j)» and I 
denote this paramecer b[i,J] • Sne probability asaocxat ^d 
with a t ree T is the product ot the parameters of the 
sequence of rules that >jenerates T. Hence, the 
probability of is the expression: 

P») b[l,2]»bL2,2]»b[Z,lJ 
which evaluates to .084 for the distrioutioa u given above. 

II. PROBABILITY AND LliNGUlSTICS 



While L(G) is infinite!, che probability of 

generating the noun-phrases of increasing lehgtn decreases 

^ geometrically. . ''ost of the probability is represented bv 
* the noun-phrases in tne following list: 

NOUiJ PHRASE PROBABILITY (by Di-Jcrioution 0) 
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n »6 

ad J n .28 
adj adj n .084 

adj adj adj n , .0252 



total .•9892 

Thus only about one percent is shared by tne remaining 

infinitely many noun pnrases in G under the 

distribution* It is the thinness of the tail <()f the 
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distribution of noun phrased (or sentences) that makes it 
plausible to use cfg In predlctlnsj tlaite samples of 
speech. The Importance of this point is chat Ic cocmnlts us 
to dealing probabilistically if we are co make sense of the 
Idea that cfg can describe linguistic behaviour, ^oam 
Chomslcy (1) often proposes infinite grammars as models for 
speech (chough he might not say It was a mod«dl) , but at the 
same ti^ne shuns probabilistic treatments of grammar as 
being inappropriate* The data^ howeveri are clear on this 
much: given a system (such as my dlctlonaJiy) for 
classifying sentences, the noun-phrase 

ADJ N 
is more likely than 
ADJ ADJ N 

and 

ADJ "lOOO 

has virtually no likelihood of being found. So we clearly 
cannot hold that all sentences in L (G) are equally 
likely* If we want to examine the phenomenon at all| tne 
only plausible explanatloni given the acceptability of 
f context-free grammars as models for speeca, is to a£fix a 

^ (1)^ See Noam Cnoraskyi "Quine's Empirical 
Assumptions'* in ;7ords and Oojections; Essays on the Work 
of Wj^ Quino t D* Davidson and J. HintlKka (edicors), 

Dordrecht, Holland, 1969, pp. :>3-o8. 
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proBability measure to the rules of the grammars used to 
model the speech, ' , j 

In the event that a given sentence-type naa tWo or 
more trees generated by a grammar G then G is said to 
be grammatically am biguous (cf. Chapter 3). A prooaoility 
distribution on a grammar generates a probabilicy for eacn 
tree. When a sentence-^ type has two or more trees, the 
obvious solution i<j to sum /togetner the probabilities of 
the trees. 

For example?, if we /^dd rule (2,3) 

(2,3) a6p -> ADJ ^DJ 
to G above, then the probability of *) is given oy 

•)" b[l,2j*bL-!,2] + bLl,2]»b[2,3j 

where b[2,3] is the probability of (2,3). (of course, 
Distribution D cannot be used unless o|.2,3j is 0. If a 
rule (i,j) is to hav^ probability 0, it is a superfluous 
rule in the present context.) 

The question may quita appropriately arise: why 
the particular probabilistic model imposed by fixi^.y a 
probability on each rule? The a.-.swer, I tjelicive, 
Inherent ^in the idea tnat the notion of cf>^ trxes to 
capture, i? not in the formal definition itself. Tne iaea 
of a cfg is that a given rule (i,J) is used to replace 
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Its left-^haad side without regard to the r^ist oi tne cree 
int:6^htch the replacement i«t made# Consller the (partial) 
trees T1 and T2 in the vjrammar G: 

T1 : ' 



ADP 




T2: 



in relation to ^he rule 



(2,1) AD? -> A^J 



/ ; 




If we suppose that (2,1) has a probability of bei.vj 

applied to n , and p2 of beinj applied lu i2 , such 
that p1 is not equal ta p2, then I would clgiinn tnat the 
underlying grammar is actually context- sensitive siac^i we 
are apparently lookincj at the **Cvontext*' tJ dacer^iue ^hich 
probability is appropriate. ^ proof oi the cl^im ini^ht bf. 
to show that an al^rithn suitable for determining which 
probability to^use would not be calculable^ in »jenerai, by 
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a push-down ^triomaton, Wnether tnis woula be completely 
. persuasive or merely begging che question is decatablt*. 

Talking about tne probability of generating a 
particular sentence inherently uses ''perf or-nance" language 
and standards, in tnat we are providing a inodel for 
observed linguistic, .^haviour. As roach as one might be 
disposed to finding this an inappropriate approach to the 
philosophy of language, thei|e is this mucn to wither 
account for or dismiss; it i^ commonplace to assert tMt 
< some ^things are more likely to be said than otnera, and tne 

hara evidence supports this co^npleteiy. 

This point can be illustrated by looking at two 
'recursive rules froin tne grammar GE1 tnat I hav.^ 
develq^d for use with ERICA (see Tablo J for the complecf^ 
3E1 ) • The rules are: 

(1,2) ADJP -> AJJP ADJ 
(14,2) AJVP -> ADV ADV? 

\ 
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Rule {^f2) 16 tne recursive adjective phrase rule, ana 
rule (14»2) Is the recursive adveroial phrase rule* 
Tables 1 and 2 give the sentences la the EkXCA corpus that 
required thes^'. rules (2). 



/ 
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i ' 

(2) The method used for obtWnln^ these results 
will be explained later/ In cnls chapter. The point of 
Introducing the results anead of their explanation Is to 
make a point In regard to the low probaoliity of long 
strings of adjectives and adverbs. Inclden^ly^ It ^Is 
Implausible, looking at the results In Chapter 2 on the 
length of utterance that the len»jth of the utterance 
alone is a good predictor of the number of, say, adjectives 
used. The fact Is that the tendency to use repeated 
adjectives drops off more quickly than the tendency to 
increase length would indicate. 
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TABLE 1 

SENTEiiCES IN ERICA THAT REQUIKE 
THE kECUKSIVE ADJECTIVE PHRASE KU^E 
GRAMMAR GE1 
rule: (1,2) ^^djp -> adjp adj 



FREQ No» of Sentence Type 
£RKES 



^o.ot Usages oj: Rule 
(1 If blank) 



11 

7 

6 

2 

2 

2 



adJ adJ n 
adj adJ adJ 
aaj adJ 

neg adJ adj 

persp link art adj adj n 

pron link art adj adj n 

adj adj. n n 

adj adj pron 

adj adj adj n 

adj adj n V prep 

adj adj h v art n 

adj adj adj adj adj 

adj adj n orep art n 

adj adj n mod neg v qii n 

adj adj adj adj adj adj adj au j 

adj adj n conj proa aux vart n 

ad^ adv adj adj 

(one per tree) 
adv link ^rt adj adj pron 
art adj adj 
art adj adj n 
art adj adj n v 
art adj adj pron 
art adj adj adj n 
conj art adj adj n 
conj pron link adj adj pron 
conj persp v art adj adj pron 
conj pron link art adj adj proa 
int pron link art adj adj n 
n link art adj adj n 
perap link ^ij adj 
pergp art uj adj n 
persp V art adj adj n 
persp link neg adj aij adj 
persp link neg qu adj adj adj 
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1^ 



pn link art adj adj n 
pron llnx adj adj 
pron art adj adj n 
pron link art adj adj adj 
qu adj adj n 



SEI'TEMCE TYPES = 39 



rc<<ENS 
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TIMES RULE (1,2) WAS USED = 58 
TIMES USED«FREQUENCY OF SENTENCE = 88 

yOTE: Due to grammatical ambiguity i»i the corpus, 
the above statistics may be misleadiny. 



ERIC 



80 



80 



TAiiLE 2 

SENTENCES IN ERICA THAT aEQUIaE 
THE RECURSIVE ADVERBIAL PHKASE KiJLE 
GRAMMAR G£1 



RULE: 



(14,2) advp -> adv adv*. 



/ 



KRE, 



1 3 

9 

7 

1 

1 

1 



No. Of SENTENCE TYPE 
Trees 



No. o£ Usages of 
Rule (14,2) ^ 



1 
1 
2 
1 

2 

5 



adv^dv 

inter link adv alv 
persp link adv aav adj 
adv adv adv 
adv adv adj n 
adv adv adj adj 
^ei'sp 'Xt^~s^v adv adj pron n 
pron linK adv adv adj 



SENTENCE TYPES a b SEiNTENC::- fOK'^NS ^ 29 

TIMES RULE (14,2) WAS USED = 1 vO 
TIMES USED*FREQUE:>.CY = 31 
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TADLE 3 
GRAMMAR G£1 



1 1 \ 






In «^ in A') 1 




jiH 1r> mA^j^ 7kA In 




avlVg^ Aviv 










ill 1 ^; 
9,1 ; 




aap — / aajp 


a 9 ^ 




a 0 ^ 


dup 06 C au jp 


1 ; 


qaOp aujp 


22, 2} 


^ m • ^ ^» ^ ^1 m 

q^cip — > c]U3rc AQjp 


22, 3; 


qadp quart 


22,4; 


qacip aec 




<]aQp 'z'' uGC au jP 


10,1; 


/4 A ^ ^ ^ T\ A ^ 4 

ae u / p ro n aq j 




SAC W r ^Glvl J 


2,i) 


nounp -> pn 


2.2) 


nounp -> a 


2,3) 


nounp -> pron 

np -> npsub prapp 


13,1) 


13,3) 


np npsub conj npsuo 


13,4) 


np -> npsub 


17,1) 


npsub -> persp 


17,2) 


npsub -> nounp 


17,3) 


npsub adp nounp 


17,4) 


npsub -> quart nourft' 


0,5) 


npsub -> quart adJp nounp 


5,1) 


vbl -> ^^uxllp vp 


5,2) 


vbl -> vp 


16,1) 


auxllp auxll 


15,2) 


auxllp -> auxll neq 


15,1) 


auxll -> aux 


15,2) 


auxll -> mod 


3,1) 


vp -> verb 


3.2) 


vp -> verb pr#p 


3,3) 


vp -> v«rb np ' 


3,4) 


vp -> verb np np 



\ 
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d2 



(3,5) 

( 3 , b ) \ 


vp 


-> 


verb prepp np 


vp 






C 3,8) 


vp 


-> 


verb np prop 


(3,'i) 


vp 




Verb prepp 


/mm m \ 

(11,1) 


verb 


V 


(11 , -^J 


verb 


— > V neg 


( 19,1 ) 


linicp 


-> link 


V 1 9 , 2 ) 
(7,1 ) 


lin<p 


ll*iK neg 


nom - 


> npsub propp 


(7.3) 


:\om - 


/ np3V\o conj npsuo 


(. 7 f 4 ) 


nom - 


> nocni 


V / f 5 } 


aom - 


> .\ adp 


(18,1) 
\ 1 3 , ) 


nomi 


— ^ npsub 


nomi 


— ^ noTii npoU o 


(4,1 ) 


a 


-> 


nom 


(4,2) 


a 


-> 


incer 


(4,3) 


a 


-> 


subj vbl 


(4,4^ 


a 


-> 


inter vbl 


(4,5^ 
(4,-0 


a 




sub J linKp prepp 


a 


-> 


inter linkp 


(4,7) 


a 




moi subj 


( 4,8 ; 


a 


-> 


prepp 


(4,9,. 


a 


-> 


lir* 3ubj J^l^ip 


v4. lO ) 


a 




llriicp suoj 


{ 4, 1 M 


a 




3ub] lin<p np 


( . ^ ^' .» 






3U '^j xinkD qadp 


(4,1 


a 


-> 


auxiip subj 


(4,14) 






^u:., vol 


( 4 , 1 L» ? 


a 


-/> 


suDj lin/cp np np 


\4, Id) 


a 


-> 


^u;<i Ip 3U bj nj 


(4, iy) 


a 


-> 


auxllp subj 


(4,20) 


a 


-> 


verb 


(4.21) 


a 




iatadv auxiip subj 


(4, 22 ) 


a 




intadv auxiip subj 


(4,23; 


a 




xncadv 


(4,2^; 


a 


-> 


verb subj 


(4,25) 
(4,?B) 


a 


-> 


aivp &uoJ auxiip 


a 


-> 


subj auxiip 


(4,29) 


a 




a Ivp 


(4, 30) 


a 


-> 


m^^er suoj 


I 4 , J 1 / 
(4,32) 


a 


-> 




a 


-> 


Inter np vbl 


(4,33) 


a 


-> 


advp subj vol 


(4,35) 
(4,37) 


a 


-> 


vbl suDj prep 


a 


-> 


verb subj np 


(4,38) 


a 


-> 


intadv subj vbl 


(4,39) 


a 


-> 


auxiip V 


(4,40) 




-> 


advp liDis-n subj 


(4,41) 


a 


-> 


linkp qadp 


(4,42) 


a 


-> 


inter linkp advp 



an 
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(4,43) 
(4,44) 
(4,45) 
(4,46) 
(12,1) 
(6,1 
(6,2: 
(8,1 
(8,2! 
(8,3] 
[8,4! 
[8,5: 
[8,6] 

:8,7; 

(8,8] 

(8,9: 

(8,10) 

:8,ii) 
:8.i2) 
:8,i5) 

8,16) 
8,17) 
8,18) 
8,19) 



a -> subj vp auxllp 

a -*> inter auxllp np verb 

a -> sub J linJcp 

a Inter auxllp advp 

prepp -> prep np 

sub J -> np 

subj -> n? prepp 

8 •!-> a 

a -> aff Int 
8 -> int aff 
8 -> neg a 
8 -> aff a 
8 -> a aff 
8 -> neg 
s -> »ff 
8 -> int 
3 -> con J 
8 -> aff aff 
3 -> int int 
8 -> nag neg 
a -> conj a 
8 -> a conJ 
8 -> Int a 
a -> a int 
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Tables 1 and 2 snow the following crend In che 
sentences that use the recursive adjective/adverb piirase 
rules: the tendency to use the rui«jb i.epeatealy is araall. 
Table 4 shows the type/to)cen counts for the repeated usages 
of these rules. 

TABLE 4 

REPEATED USAGES OF RECURSIVE RULES (l,2) AND (14,2) 

FULE (1,2) 

txO. OF TIKES USED TYPES TOKEi-S 

1 31 4d 

2 6 12 

3 0 0 

4 1 1 

5 0 0 

6 0*0 

7 1 .1 

Totals 39' 63 



NOTE: This counts sentence type 

aiv adv ad J ad j 
only once, rather than counting for eacn o£ 
the 5 ambiguities. 
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RULE (14,2) 



1 

2 



/ 



/ 



7 



28 
1 



8 



29 



NOTE: This count uses IJor eacn sentence tyi^e the 
qramfflatlcal ambiguity that had the most usaMds of rule 
(14,2). 



non-*2ero frequency for each sentence type, then S is a* 
sentei>9^ semiple . The question, **How well does ctq G 
describe the syntax of sample 3?" Is one that can be 
meaning In terms, of a prooability discrlbution on C. 
Several <inds of tests are available to determiae the 
goodness of fit" of G to S. Among tnese, the method oZ 
maximum likelihoo d s tand s ou t for its we 1 1- unde r a tood 
properties. The method involves two steps: 1) escimacln^ 
the^ parameters (in this case, the b[l,j] 's) 86 that the 
probability of S giveii G is a maximum; anU 2} usinj 
some test to evaluate the discrepancy between tue observed 
frequencies in the sentence sample S and the expected or 
theoretical frequen<^e3 provideu by the escimated 
parameters* 



III. MAXIMUM LIKELIHOOD AND ESTIMATIONS 



If S is a set of sentence types, together with a 



1 

Given any assignment of probabilities aLi_ijJ___ to^ 
the rules of Gt such that for all i, 

I b[i,j] . 1 

J 

we have a probability for aiiy sample S» If K is in 
let fREa(K) be the frsquency associated with Jc, Assume 
that no k in S is a lexically ambiguous form (see 
Chapt'ir 3)» Then let PR03(>c) be the expected probability 
of computed by first findlmj che probability or cjach 
tree for k and then summing over^ the probabilities for 
nil such tree 3^ as above. The probability of S is than 
given oy the lilcelihooa equation ; 

L) i' PKOii{>c) 

If G l3 grammatically unambiguous, cn-n i^x- 
K in S, PKOalk) is a product of sornii of tne DLi,jj ' ♦ 
and the problem of flncllng values for Lne bLit;j 's t'n " 
-.aximize L nas a simple analytical solution (J). 

If there are J rul^^s iri class tri«i* we sr . 

say that this class conLriout.es J-1 indep m vi-'h 
^arameterf}. (Tnis is because cne rulef? must sum c > i 

vl) See [5uppes-l] for a simple derivation, i — 
solution is obtained by taking the l.ilL) {Ltx^i nauir 
logarithm), computing the partial d^rivttiv^s wiua re^^jc 
to th« parameter*^, anl solving th^ r^irnuiLlaM c^.t 
equations* 

/ 
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each class. ) 

For the analytic solution, we need a simple 
concept, the UbAGE(i,j) of rule (i,j). i?or each i,j, 
let USAGE (1, J) be the number of tiroes that rule ti,J) is 
used In deriving the sentances in S, weighted by the 
f requenci^es. For example, if the ruxe (if j) is used on 
three sentences kl , ;c2, and k3, with irejuencies 
f1, f2, and f3, and supposing tnat rule (Ifj) is used 
twice on )c3, then USAGE (l,j) is 

f1 ^ f2 + 2»f3 

The anaU-ytlcal solution then gives us an estimate 
for each b[i,J], the parameter assocxaced with rule 
(ifj)f by tne formula 

bLi, jj • USAGEd, J) 

^ V USAGEd, J) 

r 

The b[i,J] 's tnen are such tnat L is ac a .maximum {^). 

Let G be <^ramroa ticallv araoiauous relative to a 
3 ample S if and only if for some k In ic nas two 

or more G-derlvations. (Notice that the above maximum 

(4) The solution to the maximum likelihood proolem 
for the unambiguous case generates only probabilities thac 
are In the Interval [0,1], wiilcn Is, of course, the 
meaningful range for proDabllltias. Maximum likelihood 
methods often have to contend witn solutions oucslde of 
this region. 

/ 
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likelihood solution requires only nou-arobiguity of the 
granmar relative to sample S under consiaeration* ) It, 
however, G is relatively aunblguous, then the analytical 
solution to the maximum liKelihood problem is not Known, to 
the best of my knowllsdge. I** general^ the expressions for 
the probability of^ a given K in S will be tn«d sum of 
products, and the te^s of the maximum likelihood equation 
become quite complicated. 

In an effort to approximate cne solutions to these 
eqtiations I have used a numerical analysis program called 

MINFUN (5), \ > 

I 

In my experience, a reasonable approximation 
appears to arise fron^ what I call the e^ual wei^r^L- 
approximation method* Cbnsider a ^entence-type k wit^ 
n trees, and notice! that if we had the appropriate: 
weights for each of the n trees, we could use tn^^m cr 
divide up the observed ^frequency of k and tnus corofl^ute 
the correct USAGE (i, J) ^or each rule (i,J). If the^e is 
only a limited amount of ! gramma tical ambiguity (say, 1^:^^' 
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than 5 percent of S), ther^ co weic^ht equally the a trer. 
for terminal-form k in | S seems lo give values t<*z the 
b[it J] ' s that are vjery little different '^r >r 
MINFUM-generated values, | (Crljinally, I used the ejaax 
weights method to prepare ijiltial values for MINtUN, -^nc 
found very little improveraer|\c even after noun of sriarcMna 



on 



the probability space for improved values.) 

IV. CHI-SQUARE AND GOODNESS Or FIT TESTS 

Any parameter estimatton fixes a probability on 
each sentence type K in S. ic remains to test the 
goodness of the fit. I used two main methods augmented by 
several other statistical procedures, rne main methods are 
i^h «> ehi-aauar e and modifl^c hi- squar e tests. 



(5) I would lilte to than]c Mr. Clanc Crane of tne 
Stanfotd Computer Science Departrifent for permission to use 
hl« program MlNf^cJ for thi3 purpose. MI«cUN was written in 
OS/Fortran for/the I^jM 360/67. I rewrote it for use on the 
PDP-10 in Fortran, IV. > - 

MINFIK4 estimates the maximum likelihood values for 
the paramet^Jrs by being fed the negative logarithm of ;-.na 
maximum likelihood equation, as wall as the partial 
derivatives thereof.- I wrote several programs to perform 
this monumental equation writing and symbolic 
differentiation, passing the equations to / the FOtilivAi. 
compiler for linkage to MINFUN by th« loader. Details of 
this process are available on request, but are not Included 
here due to their basic irrelevance. 

To resolve the equations that are generated by even 
a small sample S (say, tne sentence types in ERICA with 
frequency >» 5) requires a great deal of computation by 
MINFUN. . To deal with the entire distribution is quite 
impossible. Each new grammar , requires completely naw 
analysis. 

With 75-plus independent variables, this problem is 
quite messy by tns MINFUK program. I nave experiiuented 
with several other programs, however, and dnly MInFuw has 
the necessary understanding pf the problem of forbidden 
regions , which arises when parameters pass into values 
representing physically or conceptually impossible 
Situations (here, the forbidden region is prooaDilitiea 
outside the region [0,1j). 
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The chl-aquare test is well-Jcnown for Its 
distributional properties. Let SUM be tne sum of the 
frequencies of all K in \S, and let EXP(lc), the expected 
freauancv of )c, be 

SUM ♦ PROB(k) 

Th0 Chi- square contribution o| ic is giv^rv^y the formula 

2 

(FREQ(ic) T EXP(k)) 

CHISGUARE(K) » 

EXP(k) 

I shall say (somewhat imprecisely) toat the chi-square 
statistic associated with a model is the sum over, K oi 
CHISaUARE()c) . ^ 

Tables of the level of sigiiif icance of 
chi-aquare test are commonly availaole in any statisticj 

text. I 

To compute thie level of significance, aaother 
important factor is /the degrees of freedom. Ii.tuitivei^ . 
this is the nu-Tber of t^hings th-it are beii.9 predicted by 
the model. It is the number of sencetice tyt.es less ehe 
number of independent parameters ia the moael, less 1 
(since the fact that the probability must sum to 1 removes 
d degree of freedom). The number of independent param«terj 
is 
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^ (J-1) such that there are J rules with the 
1 label tot some K 

Some of the problems associated wxta using the 
Chi- square test^ are; 

1 ) The test should not be applied to se .ence-typea 
k such that EXP(k) < 5. This is a rule of thU'nb 

resulting from the problem that S is a discrete 
distribution while the chi* square is based on a concinuous 
distribution* To counteract this t^roblem, my estimating 
program grouped together the expecteu ana observed 
frequencies of sentence-types k where EXP(k) < 5. Tne 
grouping was done somewhat arbitrarily* I am not really 
h^ppy witri this solution oi groupin^^ unless tne 
sentence- types can be grouped according to soma critrerion 
that makes the grouo plausible* 

2) The chi-square test is unrealis tlcdlly sensicivci 
to sentence-'types with smaller ex^ectt^U fxequexicies* This 
is because tne cnl- square is a continuous distribution, but 
the applications often made are to discrete distributions, 
as is the case here* An attempt often made to correct for 
this manifestation of the conciauovis nature of chi-s^uare 
is to subtract a small value from the term 

I FREQ(K) - EXP(k) I 
I I 



92 

used In the numerator of CalSQUARE(ic) • This correction j ojc 
continuity has little effect on the cells at the cop of uhe 
distribution; It Is largel/ felt at tne bottom where the 
disparity between the discrete and continuous dxstrlbution 
Is greati^st. 

/ The second method used for determining tne v^oodness 

of fit Is the gfodlf led chi- square , which simply reverses 
the role of EXP(k) and FREQ(k). Ihe contribution of k to 
MCHI2 is 

2 

(i:Tx^(k) - EX?(K)) 

MCHI2(K) m ' 

FREG(k) 

The point of the modified cni-square is to minimize uho 
effect of a few cells with very small expected frequeney. 

V. GEOMETRIC MODELS tOR CrG 

The model for a cfg that has j-1 Independent 
parameters for each class 1 of rul<3S of cardinality j 
Is called the full parameter model * It is, however, 
possible to use only one parameter per class oy ran>:i 19 
rules {1,J) according to U2>AGS(1,J) and applyln^j one or 
several distributions that . use only one parameter • la 

EMC ., Hf) 
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Appendix It sever*! rooJels for the length of utterances In 
ERICA are discussed. Examination of the propercies of the 
53everal distribution^? used In Appendix 1 (<^eomecriC| 
polsson^ negative binomial) quickly reveals that che 
geometric Is the most plausible. The method I used for 
applying the geometric distribution to^cfg is: order che 
rules <l»j) in a given class 1, remove unused rules 
(which therefore have probability 0), and apply th^ 
geometric distribution— i.e. » with a single parameter b 
the probability assigned to the top rule in the class is 

(l-b); to the next, b*{1-b); to tne third, b"2«(1-o)i 
and so on. The last rule gets all tne remaining 
probability, hence the distribution is a truncated 
geometric . Then solve for the value of b that raaxiraizes 
the probability tnat the USAG*! distribution was obtained, 
gtven the geometric model. 

V Most classes of rules lend themselves quite well tp 

the geometric model, and tne chi-squares are iltcle 
different. The gain, statistically speaHlng, is in the 
ntimber of independent parameters involvea in the model. 
Some classes of rules nave 4d nembers, and to predlcc the 
usage's of all these with only one parameter Is somewhat 
impressive. Conceptually, it suggests a mechanism for 
syntax generation based on the class of rules that^can 
effect a certalrn replacement (..g.» the rules that repUca 
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the noun phrase with a pronoun, a noun, a decermia*»r-noun, 
etc* )• 

Since various models have different nurnoers of 
parameters, the best overall comparison I offer is the 
Chi- square (or modified chi-squara) divided by the degrees 

r 

m 

of freedom* 

VI. LEXICAL AMBIGUITY AND PROBABILISTIC GKAMMAkS 

Grammatical ambiguity is unpleasant in uiat iw 
generates numerical problsms that have no nice aolution, 
but at least grammatical^ ambi-juity represents a 
conceptually clear problem. v;e have a sentence-type, ani 
there are two or more ^trees Cor it. Tne case of lexical 
ambiguity is more puzzling. 

Let the lexical forn of a given sentence be tne 
result of substituting the dictionary classifications for 
the words in the sentence. A word is lexically anibiguous 
if the classification for that word repraseiits 
grammatical categories. (See Chapter 3.) A lexical fori; 
Is a tpermin al form (or-, alternatively, a sentence tv^e) 
only if there are ho lexically ambiguous words in uht 
original sentence. In allowing tne multiple 
classifications of words In the dictionary, I created -he 
situation of never being quite certain as to what termi.iai 

1 
/ 
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fdrm a given utterance h^d. ior exampld^ 93 sentences in 
ERICA had the lexical form 



' 

^) pronlauxipronflink art n« 



This lexical form could represent either _^_ tjie termina^r- — 

forni9 - — 

♦) ' pron aux art n 

or 

pron linJc art n • 

Lexically ambiguoua formsy such as can be 

thought of as a Kind of shorthand^ useful for a procjraunmer 
but conceptually baggage that needs removal* Terminal 
forms such as and use only symbols in che grammar 

GEl , while ♦) cannot nave a probability accordia^ to GBl 
without an 'explicit way of treating lexically amoi^uous 
forms in a sample. 

Since the dictionary introducers lexical ambiguity » 
it is appropriate to ask what is the dictionary's status in 
the analysis* One view is thac tne dictionary is a 
computational way of handling what is in fact & very larga 
grammar. In the symbol - 

pronfauxy pronllink 
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Is tha lexical classification given to sucn a contraction 
as the word 



that's . 




Icf we adopt seriously the view that the iictionary is a 
"prograntner's fiction", then we need to replace the 
dictionary with the underlying graimar upon which the 
analysis rests. This grammar would Include a rule llKe 

n •> boy 
for a word such as 

boy 

that Is classed as a noun (tha symbol 'a'). *or the word 
'that's' we could Include rules like 



pronlaux -> that's 
pron#lln)c that's . 



Actually, this is not quite context-free; however , wa can 
'remove contractions as we scan for words (the algorithm for 
which is representi^^e by a finite automaton since it n^^ed 
only look at some fixed number of cnaracters at on«* 
time— perhaps three.) Then, add sucn context-frea rules as 



pron -> that 
aux -> is 
link -> is , 



An advantage of this method is that the cerminala 
of the grammar are actually words rather than symbols 
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standing for classes of words. Moreover^ jrfhat i_ c^ — 
'lestleal >mbigui ties' would actually be grammatical 
ambiguities* and hence according to this super-grananar, 
sentences could have well-defined probabilities. But che 

astounding grammar this wuld generate would have over 

/ 

4,000 rules for ERICA, and likewise, the full-parameter 
model of the probabilistic grammar would have 4,000 
independent variables. This would so dilute the evidence 
of the data that we would have ao, probabilistic theory 
left, and all but a few cliche-utterances would have 
negligible probability, even if I had the computational 
energy available, which I haven't. Tne use of the 
dictionary moves the ' theory- testing up a level of 
generality, from actual utterances to lexical forms of 
utterances. Abandoning the dictionary, I should have to 
predict the occurrence of individual word^, and there 
simply is not enough evidence to do this (6). 

There is a deeper reason tnan practicality for 
keeping the lexicon. I cannot believe that the simple 
parsing of simple sentences requires of a child the kind of 

computational energy that would be required of a compiiter 

, 't 

to handle a 4,000-rule context-free grammar. My experience 

r 

with parsers, both in connection with this worK and in 

(6) The largfe [K-F] corpus, referred to in Cnapter 
2, had over 1,000,000 word tokens i» tne sample. Even so, 
the frequencla^ given for many words are very liKely not 
representative of written English. 

104 / 
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ralatlon to systams pro9ranniln9, strontjly suggests that 
this ''brute-'forca" approach is not at all plaualbla. 
Hence, I am prone to believe that a lexicon plays an 
Importjint theoretical rola not to be subsumed by a grattunar 
as such. This Is another manrf sstatlon of the 
cbmputatlon-^perfortnance orientation taken In this work. 

AS an example of the theoretical role tnat I think 
"of the dictionary' as- playing, consider the classic 
ambiguous sentence: 

♦) I like flying planes. 
The ambiguity Is of course wh9ther the speaker likes to fly 
planes or likes planes that fly. I wbuld assign to *) 
the lexical form 

*)' persp mod,v adj,v n . 
Cf the four alternative terminal forms represented oy 
only . two are parsed by the grammar , and eacn of these 
corresponds to one of the expected ambiguities. Note chat 
t^e other two alternative fprms were rejected by Gfel a?^ 
being ungramroatlcal. Here are the trees, as generated lay 
grammar QE1 . 
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DERIVATION OF PEi^SP MOD V N 
BY GRAMMAR GEl 



sub J vbl 



np auxilp vp 
\ • • • 



npsub auxil verb ap 

• • • • 

• , • • 

• • • • 

persp mod v npaub 

nounp 
n 
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DERIVATION OF PERSP V ADJ 
BY GKAHMAR GE1 



a 



sub J vbl 
• - • 



r.p vp 



• 

npaub 


• 

verb 


np 




• 


f 


• 


• 


t 


• 

persp 


• 

-.V 


npsub 



adp nouniJ 



adjp 



adj 



4 
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Tne ambiguity. of ♦) Is' laxlcal^ accoraing to Gtl 
since thm ambiguity la.^ptjilly dapendonc upon thm 
classification of the words In *)•• A viaw of bow the 
haarar ' processas and responds to this sentanca that Is 
consistent with ny work la that ha first looks up the words 
l^n his dictionary (perhaps really a pre-selected 
^ubdlctlonary dependent upon the contant)* and then parses 
^the reciting terminal form according to some grarA.:^^r. 
\ThuSt which of the amblg^'ltles I select depeiida on whether 
/ ,1 "see'* flying as an adjective or a varo. Wnen tne inltil|il 
selection gets me into some klnd^of difficulty^ I return !to 
the lexicon « for a subtler analysis of the words In tha 
sentence. 

For my purposes* I used threa technl^iues to 
eliminate the lexical ambiguities praserit In EKICA. These 
methods are described below*. ^ 

A. SPLIT THE PKOBABIirfllY 

The first thing that I tried was to divide up the 
obaerved frequency among thtai lexlcal\ and grammatical 
ambiguities. This method was an\ extension oi tne equal 
weights approxiriation for grammatical ambiguity. 

Splitting the probability between lexical 
ambiguities correi^nds to the assumption that tne 
dictionary plays no theoretical role. Since I believe tnis 
is falset the method is a purely a^ noc way to ^et a 
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meaningful probability distribution. I will dascc 'se it 
Since I think it is an alternative that naa co be dispensed 
With in order to understand the importance o£ tne lexicon* 

Actuallyt there are two variants of this method* 
They iire:^ 

1) Let FREQ()c) be the frequency associated with K 
in — S« Then^ if k has n alternative forms » let the 
cdrrected obqeafvad frequency of each alternative form be 

FREQ(}c)/n* This simply assumas t'nat each alcernative 
form is equally likely; 

2) Let C3UNTl(k,n1) be the numoer of derivations 
for each nl alternative form* Tnen, let COUNr(k) be che 
sum over the n alternative forms of COU^TI (k^al ) • Tn^i 
corrected observed of form nl is then 

FREQ(k) • C0UNTl(k,n1) 
COU:iT(k) 

Both versions of the probability-splitting method wer^s 
usedt 1 report the results in detail* 

B. RESCA£4NER METHOD 
A second way of handling the problem Is to devise 
an algorithm — lot looking at the lexical ambiguities and 

deciding how to handle chdjn* One explanation for this 

j 

method is that it woulc? extjsnd the **mechod6** of the grammar 
to cases formally beyond thb grammar-.^ This incerpretation 
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better fits^ the probabilistic method (C below)* What iy 
have in mind in the rescanner model is sometning else* 

~The theoretical hypothesis .l*have in mind is tiia^ 
the initial response to a sentence consists or putting the 
sentence into a lexical f Drm» including * initial 
disam^iaua^ion t then proceeding to parse tne terminal ^oym 
pt forms* If the sentence has a clear ambiguity (such as 
in many Jok^^i, where the clear point ^i-^ to have an appareilt^ 
ambiguity as the ba/9is of the humor), then the lexical form 
will be ambiguous; however » the listener will usually 
select the most likely classification from ^]}e lexicon 
a^one for the first pass at parsing the sentence* In the 
above 'flying plan»js' example, tno listener might classify 
the word 'flying* as a verb before the parsing algorithm 
was even called* This matnod of lexical dis^moi juation is 
specifically or lent *d toward the listener* 

Q. PROriABILISnC MODEL 
The most satisfactory method of lexical 
disambig^iation I have implementea is based on che | 
probabilist ic model * Briefly, each of che lexical 

ambiguities is assigned a probability^ and the mosc likely 

/ 

ambiguity selected* The exact d^^tails of thi^ approach are 
given below, after a discussion of the graunmar GEI * 

In the 'flying planes' sentence aixava, the 
alternative form 
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parsp mod m n > 
h«d prc^bablllty .0014» and was hanc« selected oy the model 
over thb £orn). 

- p0rsp V adj n 
which had probability .00016. Tha grammar would therefore 
select the reading of% the sentence which means that Jfche 
speaker likes *td' fly planes. 

I am not personally convinced chat this is the 
correct approach to lexical ambiguity. Particularly, I 
thlnlc ihat ambiguity la really semantlcali out this ^oea 
not preclude the possibility that disambiguation is done on 
the basis of syntax alone. I assume that tna full 
machinery c£ language processing is seldom called Intp 
play. 

However, the probaDlllstlc model does one thln.jS 
it provides a concrete exa-nple of the meaningful use of a 
probability measure on a context-free grammar. 
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VII. THE GHAMI4AR GEI 

A 8 mentioned p Table 3 contaias the grammar GEI. 
This grammar; is something of a compromise as it was 
developed t^oca the interacting tension of four criteria, 
which are: 

1. recognize as much of EPICA as possible; 

2. minimize both grammatical and lexical 
ambiguity; 

3. provide a goo5 probabilistic mod^l for 
the sample ERICA; 

andp most importantly , 

4. provide a good test for tne semantical 
theory I had in mind. 

Better grammars could no doubt be written for any one 
single purpose. Rather thau include a whole complement uf 
grammars in this work, I decided to include one that triea 
to be a complete model. I am pessimistic aoout the future 
of probabilistic grammars unless chey are implemented in 
the service of disambiguation and semantical evaluation* 

Needless to say, grammar GEI is the product of many dozens 

/ 

of discarded grammars. 

/ 

Several hi 'jh- frequency lexical f<Sx;ra^ are casualties 

\/ 

of GEI, and are not recognized at all by tne grauwnar. 
Appendix 5 lists those forms witn frequency greacer unan or 
equal to 5, and shows: i) now many lexical ambiguities 
were in a formi ii) how many trees per lexical ambiguity; 
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111) and the forms with frequency >9 5 thac are not 
recognized by OEI • 

Some of the high-frequency failures of G£1 are 

.(7): 

1) 28 adj adv • 
Adding the rule 

s -> adj adv 

will of course parse this terminal form and wxll do so 
without affecting the rest of the gransmar at all. There 
Is^ however^ little to be gained by such an a£| hoc 
solution; Indeed^ adding one rule to recognize one 
sentence-type Is something of a loss. Or course, any 
corpus of n utterances can trivially be recognized by a 
cfg with n rules, so It Is not surprlsln^j that a single 
rule can often be trivially added to a grammar. 

2) 26 mod persp v^mod prep^aav adv 
10 persp v^mod prep, adv adv 

Many of the forms not recoignlzed rt^present a 

complex verb phrase, perhaps lacludlag modal veros, . 

prepositions, and adverbs. My efforts to Include these In 

L(GE1) resulted In many added grammatical aroolgultles 

elsewhere. A minimal distinction required to deal with 

verb phrases more ade^^uately Is the transitive-Intransitive 

/ 

(7) It Is my practice ^ precede utterances, woras, 
and phrases with a number* That number Is the frequency In 
the data under consideration, usually the EAICA corpus. 
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distinction in verbs r 

The transitive*intransitive distinction is designed 
to distinguish between verbs that take no objects ^ and 
verbs that can takei say^ a direct object. Ui»£ortunatelyi 
the same verb carP take 0^ 1 » or 2 objects (and perhaps 
nore). Consider the uses of the verl> 'to read* in the 
three sentence ss 

C| ' / 

1) John is readingP 

2) John is reading the Bible. 

3) John is reading the Bible to a blind man. 

Each sentence clearly uses che same word in (approximately) 
the same sense{ yet the number of objects varies. It the 
constructions possible by the grammar depend upon the 
number of objects the verb may take^ then we need to list 
/to read' as several different kinds of verbs for usages 
that are not very different. Moreoveri semantically there 
is no reason to stop at two objects— we mijht add obj<»ct 
"slots* for time, place, and other adverbial concepts. In 
Chapter 5 I argue that the simplest 3eiaantical 
interpretation for verbs does not seem to require the 
transitive-intransitive distinction as a part of the 
syntax. 

To carry out the transitive-intransitive 
distinction in a semantically sensible way would be to let 
**transitive* refer to verbs that may take objects 
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optlbnally. This approach would i howeve^Ti iaad to 
claatifylnig tha oojectlva cases of certain pronouns In thu 

dictionary. (For example^ the objective case of 'l/ Is 

\ 

'me'.) \^My dictionary Is not this subtle. 

a) 13 persp auxppersp llak qu^pron v 



can be handled by adding the rule v 

persp aux pron v 
to GEl . I did not do this because I am confused by the 
order of the verb In the sentence » and I also fe^ that I 
need the transitive- intransitive distinction to handle 
this. i 

VIII. LEXICAL AMBIGUITY IN THE ERICA CORPUS 

. Of course it is desirable to write a gramiiar' that 
has a minimum of ambiguity^ both lexical and grammatical. 
A cfg G can yesolve a lexically ambiguous form if and 
only if exactly 1 of the terminal forms is recognized by 
GEl . (If none at all is yecognlzedi then the sense of 
resolution is that of \ dlssolutloni suitable for 
philosophers but unsettling ^o programmers.) The sentence 

93 pron#auX|pron#llnk art n 

Is a case of resolution. The alternative terminal form 
pron link art n \ 
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l8 recognlsttd by GEIf whila 
pron aux art n 

is not recognized by GE1. This is intuitively satisfactory 
if one looks at the 93 original sentences in tne original 
corpus. When G resolves a lexically ambiguous lexical 
form, the alternative terminal form that was recognized is 
called the resolved lexical form « In tne above, 

pron link art n 

is the resolved lexical form. 

A slightly more subtle example of the ^resolution of 
lexical ambiguity occurs in the lexical form 

♦) 27 adv persp link, aux 

where the alternative form 

adv persp aux 
is recognized wnile 

adv persp liaic 

is not.' This is again intuitively satisfaccory if we louk 
at the actual 27 utterances in their original contexts; the 
reason la that adverbs seldom modify the linkinc^ verb. 

Words classified as 

link^aux , 
are the forms of the verb 'to be*. The /reason for ^having a 
muUtlple dictionary classification for these words is that 



iin 



110 



It Is necessary to distinguish seinantically cneir uses. 

If k < is a lexically ambiguous form witn n > 2 
alternative terminal forms^ then G is said co reduce K 
if G recognizes n' of the n alternative forms, for 
1 < n'<n • Keductlon may generate a new lexical form. 
When it doe?, the new foi^ is called the reduced lexical 
form* 

There is a great deal of lexical amiaiguity in 

\ 

ERICA. Of the 2,995 types, 2,185 are lexically amhi^jUous. 
Mmy of the low-frequency sentence- types contrioute to tnis 
pessimistic figure, since of ' the 9,035 sentence- tokens, 
only 4,419 are lexically ambiguous. 

Gill parse3 about 78 percent of the tokens in ERICA i 
and resolves about 56 percent ol: the lexical amoijuitias. 
Table 5 details these results, showing boch absolute 
numbers and percentages. » 

As a measure of the siccasa oi GE1 in removing 
lexical ambiguity, I calculated the aunoiauitv factor thus 
defined! for each sentence-type k in the sample, 
multiply fREQ(k) by the number of alternative cerminal 
forms less 1. Then the ambiguity factor is the sum of this 
quantity over the k in the sample. The measure i£i 
intended to suggest how many ^'excra** lexical 
interpretations there are. The ambiguity factor for th^ 
complete corpus was originally t1,b85; for tnat porcion of 
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tha corpus parsed by GE1, tha factor was 6,010, Indlcatlnc, 
that many var^' ambiguous sentance-typas were not racognlzad 
by GE1 ; tha ambiguity factor for tha set of rasolved and 
reduced lexical forms was 7ti1 . I take this co be quite an 
improvsment, although the only data I have to compare it. 
against are tha raauxts of (many) earlier grammars. . One 
earlier grammar had had somewhat betver values; nowever, it \ 
only recognized about 73 percent of ERIC^* 

i 
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'fABLE 5 

LEXICAL AMBIGUITY AND GRAMMAR G£1 



CHILD PORTION OF ERICA 

TYPES TOKENS 

COTAL SI2E 2,995 i*i085 

LEXICALLY AMiilGUOUS PORflON 2,185 4,419 

72.95% 48.64i 

NON-L.A. PORTION 810 4,666 

27.05X 51.36X 

PORTION PARSED BY GEI 1,394 7,046 

46.54« 77.56% 

PORTION OF L.A. PARSED 1,033 ^ 3,030 

47.28% 68.57 

PORTION OF NON-L.A. PARSED 361 4,016 

44.57% b6.07>i 

L.A. COMPLETELY RESOLVED BY 831 2,464 

GEI 

Od.0J% 55.7oX 

L.A. REDUCED BUT NOT RESOLVED '105 W4 

4.8U 4.39/> 
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The resolution and rdduction of lexical ambiguity 
reshapes the lexical forms present in the corpus ^ as 
originally distinct forms become the same* For example» 

400 parsp vfAeg^mod^neg v 

merges with two other forma to become 

402 persp mod neg v 

when the resolution of lexical ambiguity occurs, Tnis 
merging process I call consolidation * 6E1 recognized 1|394 
of the original 2^995 types in ERICA, After coasolidation^ 
1»125 types remained^ still accounting for 7^046 tokens. 
This is encouraging since it means that there were fewer 
types in the sample than the original pass at the 
dictionary would have suggested* 

> The major onus (as far as this cnapter is 
concerned) for accounting for the remaining lexical 
ambiguities comes from the need to obtain a sample that can 
have a probability distribution geaeraced by a context--f ree 
grammar* Trying to res^ve all sucn ambiguity by a ^jramroar 
is an idea that is sed^ictively difficult. 

What is 4nore possible is to devise axi algorithm^ 
perhaps with some context-sensitive elements^ that extends 
the way that the grammar handles ambiguities when it is 
successful to the cases wnere it is not successful. This 
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approach suggasta a modal with a raacanna.* that looica at 
unrasolvad amblgultias aftar an initial parsa a 
contaxt-fraa grammar. ' ^ 

Tha "raacannar modal* I used on the ERICA corpus 
simply picks tha most "likely* ainjle classification, in 

mo«t cases. I looked at the ways in wnich GE1 resolved 

I 

ambiguities, the frajuencles of single clasaif icatlons in 
the dictionary, and also the sentences cneraselves in 
developing the algorithm, wnich is ahown in Table 6. The 
left-hand column is the ambiguous classifications the 
right-hand oolunn 8how& what it was resolved to, and, in a 
few casen, gives a simple conditional rule* 
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TABLE 6 

RESCAANER MODEL FOR DISAMBIGUATIO4N 
ALGORITHM FOR RESOLUTION OF LEXICAL AMBIGUliY REMAINING AFTER G£1 



LEXICALLY AMIIGUITY 



i^BSOLUTION 



qu^pron 

n ^ ad J 

v^mod 

v^aux 

Itnktaux 

p^rapi^pronadij 

n 9 adv 

vfaag^modfaeg 

pad j t pnlatuct pn# llnK 

padjtPn#auxtn#llnk 

parspll loXt Pfir apfaux 

pronlauxt proni link 

persplauxt p^rapfllnk 

Interfauxt Intarfllnk 

aux#na9t llnkfnoc} 

padjtPnfllnk 

prap^conj 

padj^nfllnk 

nfaux^nlllnk 

prep»adv 



n.v 



qu 
n 

V 
V 

link 
, pronadj 
n 

V neq ^ • 

_ Padj 

parsp link 

pron link J 

p^rsp link 

Inter link 

link neg 

padj 

con J 

padJ 

n link 

(l£ the next word or 

last word was 

adv» then prap^ else adv) 

(If n leaves the sentene^i 

all nouns 9 

then Vt else n) 
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The algorithm favors nouns, tnen adjectives, then 
verbs over the other classes. There i3 sometnin^ va^ucAy 
to be said for the claim that thi\ alcjoritr.n extend* the 
methods of GEl . An exception, is tiie resolution of 
*qu,pron' to 'qu'. GEl usually resolves to 'pron', sinfce 



phrase. The above algorith"), however, resolves 'qu,pron' 
to 'pron', since most of the remaining ambiguities are what 
appear to be noun phrases. The problem is caused by the 
rules that allow multiple noun-phrasea to be aoun-phrases ; 
inadvertently, these rules let 'qu,pron' be either a 'qu', 
modifying the noun, or a 'pron', a part of a multiple 
noun-phrase. Two high- frequency sentences displaying thii. 
problem are 



it does not 



leave a quantifier thac modifies 



no noun 




Tne trees for these sentences aje yiven in Taole 7, thus 



illustrating the problem with multiple noun'=phrases. 
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TABLE 7 

TREES SHOWING CONFUSION Zi^i GBl OVER qu,pron 
TREES FOR perap v qutpron n 



/ 

a 



•ubj vbl 
np vp 



npsub V9rb np np 



parsp V npaub npaub 



nounp nou«ip 



pron n 
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8Ubj vbl 



np vp 



np3Ui> v^rb np 



persp V npsub 



quart nounp 



qu n 
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TREES FOR persp v pr«p qu,pron n 



subj 



np 



vbi 
vp 



npsub verb prepp np 



• ^ •••••• 

persp V prep np npsub 



apsub nounp 



nouAp n 



pron 
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a 



3ubj vbl 

np VP 

• ••••••••• 

• • • 

npsub verb t^repp 



persp V prep np 



npsub 



v^uart nounp 
qu n 
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Tabla 8 glvtts the statistical results of using the 
above rescanner model for disambiguation ot\ ERICA p for 
the various combinations of the full parameter model versus 
the geometric modelp ai«d the cnl-aquare versus che modified 
chl-square. All models group for expected frequency less 
than 5p and Include the correction for continuity of .5, as 
explained above* The results are summarlse^.l only, and give 
the chl-s^iuare (or modified chl- square), tne decrees of 
freedom, the chl- square divided by the degrees of freedom, 
and a statistic called the residual. ine residual Is 
simply the difference between tn*^ sum of the observed 
frequencies and the sum of the expected frequencies. Ic Is 
therefore the number of sentences chat cne gram.iiar 
predicted that we would find, for sentence- cypes that were 
not found at all* Recall that every sentence in i^(GSl) 
has a non-zero probability, ana tnat L(<jE1 ) l3 infinite, 
since It contains some recursive rules, nence, we should 
always expect a non-zero residual; but the smaller, tne 
better. The size of the residual Is yet anotnar <^^\xq^ of 
the goodness of fit. 
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TABLE a 

RESCAMSER MODEL OF LEXICAL DIi»A:^3IGUAriO:, 
PR0BA3IL1STIC MODELS OF E«ICA SPEECH* 
GiiAMMAR GEt 



MODEL 

CHI-SQUARE RESIDUAL DEGREES CjB;-SaW§ 

Ci' FREEDOM DEGREES Or FREEDOM 

GROUPS 



Full parameter 
Chi-aquare 

24,001.52 2,117.40 106 88 22o.43 

Geometric 

Chi-square ■» 
47,139.22 1,540.84 120 69 392.03 

Full parameter 

Modified Chl-aquare ^ 
21,078.16 2,117.40 106 88 l^d.bS 

Geometric 

Modiflad Cni-aquare en 
(y 14,219.49 1,540.84 120 69 llo.SO 



♦ After conaoliiation, the rescaaaer model had 
1 ,072 sentence types, still accounting for 
7,046 tokens. 



I - 
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The most accurate method I us«id for lexical 
dleamblguatton^x Is the probabilistic method* Scarcin^ with 
values for ee|ch bCi^j] ^ I computed the probabilicy of 
each alternative lexical form for a type^ ana then selected 
the most probable alternative. (I xxi^d tae values 
generated > by the rescanner model 'jiven above as the 
parameters*) The method turned out to be uncannily subtle. 

For example, on the lexical form 

11 persp V qu,pron n 

discussed above, the alternative 

persp V qu n 

had a probability of .0036 while the other alternative 

persp y pron n 
had only • 00005. Likewise, for the form 

persp V prep qu»proa n 
the probability was • 000081 5 for 

persp V prep qu n 
which was preferred to 

c 

persp V prep pron n 
with a probability of •0000119. 

Of course the rescanner model made cne same cnoices 
in these cases. The probabilistic model turned out to oe 
much more sensitive in cases such as 

3 qu,pron qu^pron qu,pron qu^pron qu,pron pron 



Of thm 32 alternative formi hera, 13 %tar« racognxzed by 
GB1 • Tha rascannar model chos« 

qu qu qu qu qu pron 
(which may wall ba correct) while the t>robabili«tic modei 
selected 

qu pron qu proA qu pron 
indicating, at least, that it is trying to follow the 
grammar closely. 

Since the rescanner model always replaces •qu,pron* 

by *qu' , in particular the lexical form 
qu,pron 

is reaolved by tha ^rescanner to 

qu • 

This is clearly unsatisfactory. The probabilistic . model 
makes the intuitively correct clioice, as is shown in fablo 
9, which includes tha resolutions made by tiie probabilistic 
model wliere FR£Q()c) 5. 

After dlsambljuation by tha probabilistic method, 
there were I.ObO types remaining (having begun with 1,1 25.) 
Table 10 gives the statistical results of the various ways 
of testing the fit. 
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/ 



/ 

.131 



125 



Grammatical ambl9Ulty ranainlng In the corpus is 
actua'lly rather amall. This could be because many of che 
classical ambiguities'* are lexical in nature. The 
following give^ the number of ty.>es (and tokens) with 
various numbers of derivations. (A type has 1 derivation 
just in case it is not anbiguous*) 

GRAMMATICAL AMBIGUITY REMAINING AF^ER LEXICAL OISAMdlGUATION 
PROBABILISTIC MODEL OF DISAMBIGUATION 



NUMBER OF TYPES TOKENS 

DERIVATION (S) / 



1 980 6,919 

2 78 , 125 

3 1/1 

4 0 0 

5 ^ 1 1 

/ 

/ 

About 92 percnt of the types (98 parc«}nt of the tokens) in 
this reformed sample ara •jraromatlcally unambiguous. Tal<i 
1« sufficient, I clalmr for assurance that the equal 
weights approximation mattMd will give reasonaole values to 
the maximum likellhootJ problem. 
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TABLE 9 

PROBAkuISriC MOOEL OF LEXICAL DISAMdlGUAliO.^ 
SOME HIGH-FREQUEWCY DISAMBIGUATIONS* 

FREQ RESOLUnON COUNT SOUKCE PROB 



87 
30 
27 
24 
14 
12 
11 
9 

8 
7 
7 
6 

6 
6 

^ 

o 
6 
6 

6 

5 
5 
5 
5 



pron 
qu pron 
qu n 

ndv pernp mod 
V qu n 

persp V qu n 
pttr^p V 
inter link. 

(1,1) 



(1,t) 
(1,1) 
(1,1) 
(1,1) 
(1,1) 
(1,1) 
(1,1) 
adv adv 
Inter'AUX 



persp rood neg 
persp link 
mod neg persp 
aff persp v 
persp V 
persp V prep 



(1,1) 
(1,1) 
(1,1) 
(1,1) 
(1,1) 
qu n 

(1,1) 

pron link pron (1,1) 
pron link qu n (1,1) 
pron qu pron 
(1,0,1,1) 
V persp (1,1) 
Ktt persp link (1,1) 
link pron art n (1,1) 
persp V qu pron (1,1) 
persp aux ne^ (1,1) 



qu,pron 
qu,pron pron 
qu,pron n 
adv persp v,aaoa 
V qu,pron n 
persp V vju,pron n 
persp v,mod 

, inter I link adv a^v 
perep v#neg,mod#aeg 
persp. link, aux 
v#n<a9,mod#neg perap 
aff perep v,aux 
persp v,aux 

persp V prep qu,pron n 
pronllink qu,pron 
pronliink qu,pron n 

pron,qu qu,proii pron 
v,aux persp 
aff persp link, aux 
link, aux pron arc n 
persp V qu,fe'ron pron 
persp aux#neg,linKi'ne9 



,0135a7^ 
.0012o78 
.0033439 
.0005833 
.0008767 
.0003633 
.0161562 

.0001362 
.0007625 
•001 3»75 
.0003895 
.0001 036 
.0161562 

.00008)5 
.0004769 
.0001174 

.0000iJ6 
.0116550 
.00000^0 
. 0000008 
.0001378 
.0002396 



»rhe SUURCS is the lexically amuiijuous form. Ine 
numbers in the COUi^T indicate, for each alternative form, 
the number of derivations of that alternative accoraing to 
GE1 . PROa is the probability associated by GE1 co the 
alternative that is best, which is then tne RE50LUfIc;i. 
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TABLE 10 

PKOaABIJLISriC MODEL OF LEXICAL DISAWaiGUAf ION 
PROBABILISTIC MODELS FOR THE GRAMMAR G£1. 



MODEL 

CHI-SQUARE RESIDUAL DEGREES CHI-SQUARE 

OF FREEDOM DEGREES OF FREEDOM 
GROUPS 



Full pararaater 
Chi- square 

22,215 2,108 103 90 203.81 

Geometric 
Chi- square 

45,776 1,487 125 72 366.21 

Full parameter 
Modified Chi- square 

15,834 2,108 109 90 145.27 

Geometric 

Modified chi-square 

12,206 1,487 125 72 97. o5 



) 
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Appendix 6 contains the complete printout of thu 
*s for the full parameter and geomecric models. 
Alsot I include a run of the Cull parameter model on the 
sentence- types with frequency >a 5, which is Appenaix 7. A 
complete printout of this %iould run several hundred pages. 

IX. PROBABILISTIC GRAMMARS A^^D UTIERANCE L£i^TK 

In Appendix 1 I discuss the leagtn of utterances in 
ERICA » and offer several probabiliatic models co kccounc 
for utterance generation. Table 3 of Appendix 1 gives the 
length distribution for tne entire corpus^ showing that whe 
. most probable length is 1 , followed closely oy 2 and ^ 
While the negative binomial discribution fits thi> 
reasonably well, aa it stands it suggests no mechanism for 
utt'^rance production. 

A probabilistic grammar is such a mecnanism. Given 
a (non-sero) distribution to a grammar each sentence in 
L(G) has a probability. Hence » for each length i, tnere 
is a probability associated with i, wnich is the sum over 
k € L(G) such chat |k|- « i (8). 

I have computed this sum for im\^...^4 (9). The 
results follow (using the parameters resulting from th^-^ 
probabilistic modol of lexical disambiguation). Incl\xdea 

(8) See [Suppes-2] pp. 25-29.. 

0 
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also are the number of utterances In L(GE1) with a gxven 
length; this number grows surprisingly quickly* 

\ 

UTTERANCE LENGTH ANALYSIi 



Length Freq(in L(GE1)) Prqb 



1 17 .298 

2 180 .238 

3 \ 1,242 .182 

4 8,929 .135 

no. of utterances » 10,368 
total probability m .853 
residual probability a .147 



The first four lengths account for aix^ut 85 parcent of tne 
probability of utterance distribution. Jsln^ tnese values 
as a predictor for the values in Table 3 of Appendix 1 , we 
find the following result^. 




(9) The algorithm I used for tnis computation to 
generate all the length-i utterances (in Incarnal 
representation in my programs) and check each one. Since 
there are 21 terminals in the grammar GEI , this means that 
the program had to check 204,204 possible utterances, which 
required 40 minutes of computation timet A^much more 
efficient method would be to look "top-down*' ac the . 
sentences^ expanding the tree according to some strategy; 
however, ihe programming investment is beyond the worth of 
the question in connection with this work. 

I3(; 
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OBSERVED VS. PREDICTED UTTERANCE LENGTria 
^3RAMMAR 6B1 



LENGTH 


OBSERVED 


THEOR. 


THEOR. 


CHI- SQUARE 




FREQ 


r'REQ. 


PROB. 




1 


2,072 


2,707.33 


.298 


149.09 


2 


2,064 


2,162.23 


.238 


4.46 


3 


1,950 


1,653.47 


.182 


53.18 


4 


1,142 


1,225.47 


.135 


5.82 












TOTAL 


7,228 


7,749.50 


.tt53 


212.55 


PERCENT 


.7959 


.853 







GE1 predicts that e will find about 85 percent of the 
utferancaa In this ranga. in fact, only atoout 60 percent 
are there. I think that the explanation Is that GEl is 
simply Incomplete, ^(Tth'it it doesn't parse as many of the 
more complicated forms as it 'should. 



J 
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CHAPTER 5 — SEMANTICS 

METAMATHEMATICAL SYNTAX Awu SEMA^^IICS 

todel*th«or«tlc aamantlcs was invented by Alfred 
Tarskl to raa}ce precise the notion ofjtne meaning of a 
first-order sentence In terms of a set of oojecta D 
called the domain 1 thg mcyiel . and a set of primitive 
relations and functions on the domain (!)• The primitive 
terms of a first-order language are the variables and 
constants. It^ls convenient to allow thac these denote 
individual objects in the domain. Complex terms artu 
formulas then have their denotations defined recursively 
from the denotations of the simple terms ana tae rules of 
composition given in the language. 

I offer the following simple exa-nple of a 
first-order language LI , with its trutn definition. 
(There is, of course, nothing new in cnls tireatment. I 
give it simply to provide cof.tinuity of notation. ) The 
language is a fragment of qiiantif ier-free arithmetic; for 
simplicity, I omit the quantifiers and variables they bind, 
and consider only a more restricted case. 

(1) Sea "rh« Concept of Truth in Formalizeci 
Languages" in i^ogic. Semantics, ajnd Metamacnematics bi 
Alfred Tarski. 
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The language LI : 



constant terms: 



a 



function symbol: 



4^ a two-place operator 



predicate symbol: 



a two-place predicate 



parentheses: 



to snow grouping 



loi^lcal connectives: 



1* The set ? of terms contains the constant terms^ 
and if x^y are In then (x > y) is In T. 
Nothing else is In T. 
2 • The set F of formulas contains: 

' 1) if Xty e T then (x a y) € 
11) if a,b € r then (a -> b) € 



Hi) Nothing else is in F. 

The intended model for LI ia tne uomain J o£ 
the positive Inte^jars, where the symool -t* means addition^ 
the symbol = i\eans equality of two integers , tne con;9tant 
a denotes 0^ and the constant b denocas 1. Note tnat 
the domain satisfies the familiar property of closure ^ 
whereby if i,J are la ^nen the sum of 1 and j is also 

in D. This is necessary sinca all of tnes^ sums represen:. 
terms in the language^ and each term must denote. 



I now give^ informally, the rules for the mv^anings 



of the formulas in b\ Notice that each rule corresponds 



(a v b) 6 F 

(a d) £ c 
(-la) e F 



I3n 



1 33 



/ 

/ 



to a way or process by which formulas^re created, 

i) (x a y) is true Just in/caae the vienotation 
of X is identical to the denotation of y; 

ii) (a «> b) is true Just in caSe if a is true, 
then b is tru^; 

ill) (? y b) is true Just in case a is true a 

or b is true; 

/ 

iv) (a S. b) is true Just in case a is true 
and b is/true; 

/ ' , ■ 

j9) (-la) is true Just in case a is false. 

We can now show that each formula of F is either 
true or fhlse under tne model provided, ai.d it is clear 
that the Interpretation is "intuitively satisfactory" — 
i.e., the "true" formulas correspond to well-Known truths 
of arithmetic. 

The above interpretation for Ll is deceptively 
satisfying. Nothing about the syntax requires that this, 
the intended Interpretatioii, be the only one. In 
particular, we have statad no axioms to even guarantee that 
such properties as commutativity or transitivity apply to 
the function symbol ^.. A primary goal of model theory is 
to characterize, given a language, the classes of .nodels-^ 
that various sets of sentences in the language can have. 
In order to do this it is necessary to characterize the 

notion of a model. 

The characterization is that of a yelat^onaj 
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s tructure * Let 

II at OyPlf ••iPn^ Fit •t^(Bf^1f*f&^> 
(where Ittn^n are nature? iumbers) 
be a relational styugt^yre 1£ and only if 

i) 3 is a non-empty set of objeccs; 

11) for each Pi^ im^^^^n^ there Is an ri, 

called the yank o| Pi , such that 

Pi c D'^ri i . , ' 

ill) for each Fl, i-1»tfni» there is an ri^ a^^ain 

called the r;ank of Fi^ , such that 

Fi: D'^rl ~> D 
{i.e. 9 Fi a function on O'^ri into D) ; 

iv) i^^i^ch ait i»1f***fkt is an element of 1^. 

Following this def inltion^ the class of moddl's fur 
the language ^1 is any structure 

V » <D,F,A,B> 

where D is nonempty, F is a function on D^2 into D, 
and AyB are elements of T). 

It is not enough to give a model for LI ; it 

is also necessary to shw how va^^'^a^ions for each f € jui 
are constructed* This is done by associating sejnantical 
rul^S t in the form of set«*theoretical functions » with che 

j'li 
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rules of formation for the formulas of LI. 



VALUA nON OF TEtiHS: 



i) basis conditions 
= A 
s B 



([a], or more expllcity [a] , 'means the valuation o£ 

II 

a is M •) 

11) recursion condition 

[(x + y)] a F([x],[yJ). 

VALUATION OF FORMULAS: ♦ 

1) ba^ls condition 

[(x « y)] m If [x] m [y] tnen true else 

false. 

11) recursion conditions 

[(x y)j - if [xj is false or lyJ is true 

cnen true else false 

[(x V y)J = if Lx] Is true or u\ id true 

then tru3 3l3e false 

[(x & y)J ^ if [x] is true and [yj Is true 

tnen true else false 

l( -ix)] if [x] is true tnen false else 

true 

There Is an Important distinction to btj made 
between three kinds of symbols iu the lan^uaije* Soro^ 
symbols — afbf+ — denote objects in the model 11 ; 
these I call denotin^j symbols > Other symbols — 
a>#v, 51,-1,5 — signal the use of certain semantic rul«^s, 
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such as implication or identity , but do not denota objects 



paranthaaas (and aomatimas comnaa, bracJcata, and oracaa) 
maka grouping clear. Thaaa X call uti^litv av»bol8 > 
Utility aymbola may ba eliminated from firat^orLiar loc^ic by 
using polish notation, wherein the order ia implicit. 



II • COMEXT-t^aEE AND METAMAjTKSMATICAI* S/.^TAX 
The treatment o£ tne language u1 given in Section 



I corresponda ia atyXe to thaC^^ually encoUiitered in lOv^ic 
textbooks. It is worth notinj, given c'ne convention o£ 
uaing generative grammars in linguistic studies, that there 
is a certain correapondence bet%Men tne definition of 
syntactic classea by giving cloaure conditions of sets, 
above, and the use of context-free grammars, xtna language 
LI can, for example, be defined by tne followin<^ cfg G, 
where 

G « <v,r,F,p> 

V a { a,«>,v,&,-i,4,a,b, (, ),T,F } 

• T • V - {r , ?} 

and P contains the rules 

::,i) F -> (r • T) 

(1.2) F -> (F r) 

(1.3) F (t V F) 
<1,4) F •> (F 4 F) 
(1,5) F -> ( -1 F) 



in 





Finally, 
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(2.1) T -> a 

(2.2) T -> D 

(2.3) r -> (T ♦ T) 

Then, tne semantic rules associaceu with the 
closure conditions can be associated Insteaa witn the 
productions of G , mutatis mutandis > 

It Is of some Interest to asK what the relation Is 
between context-free granomars and the Kinds of definitions 
[obtained by giving closure coiidltlons on classes, dlnce the 
former Is standard In linguistics while the latter Is used 
extensively as the syntactical basis for model theory* The 
usual requirement for logical syntax Is that the sets must 
be recursive, and there are recursive sets tnat are not 
context--f ree# However, the full comt^lement of recursive 
methods Is not needed for the fundamental syntaccic notions 
o£ tne formal lstnguai:^es of mathematical lo^ic; several sucn 
syntactic classes ar^^ usually defined by a Kind of closure 
that I call simple closure * It Is necessary to formalize 
this notion of simple closure, as a Kind of syntactic 
meta-meta theory of mathematical logic* 
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NOTATION: a^bpC are syntactic ub:ioCt8 ; 

Tt y are aets c£ avn tactic oblacts : 
x» Yt 2 are avntactic yi^yiaolaa 
ranging over sets of syntactic 
objects. 

The following ara primitl^veas 

a set M of symbols; 

an operation & on symbols in 

Known as concatenation (2)« 

a symbol mem denoting membership^* 
e.g.» a roem S; 

the symbol then denoting a conditional. 

the symbol and denoting a conjunction. 

Syntactic Objects (S.O.) 



i) M c S.O. — i.e.» symbols are syntactic 

objects ; 

ii) if a, ^ e S.O. 9 then or & ^ € S«0. 

S.O. corresponds to the class T-i- associated with 
context***free grammars. 

(2) The set M corresponds co cha terminal 
vocabulary T of a cfg G. However » the operation for a 
giTSmmar corresponding to concatenation is to^put a space or 
a plus sign between two symbols being concatenated**. 
Concatenation is intuit! tively putting symix^ls side by 
side} but the grammarian does not write 

AUJN 

but rathet: 

ADJ N 

or 

ADJ ♦ N 

Vhe problm one of notation ^ hinging on the difference 
between a symbol** as a formal object and a ** symbol as a 
typographical character. 
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Syntactic Terms (^.T.) 

1) S.O, C S.T. t 

li) If X 18 a 8/nta\tic variable then 
X € S.T.; 

Hi) If cr, p € s.r. thenV or & p € S.T, 
S.T. corresponas co the clas? a\^soclated wltn a ctq. 

Positive aoQl^an Cxpre3>!loni \ (P.B.E) 

1) If X Is a syntactic variable and S 
is a sett then 

X meia S 

is a ; 

11) if Tj^, € P.B.E., and no syntactic variable 

occurring in occurs In or conversely » tnen * 

r, and € p.b.e. 

1 2 

Simple Qlosurf Conditions (S#C.C.) 
' i) if cr ^ S.O. then 
a roeip S 
is an s.c.c. (on s) ; 

11) If r € P.B.E., a e S.T., then 
r th^ n or mem S 
is an S.C.C. (on S) . 

Hi) the extremal clause ("Nothing else is in S.") 
is an S.C.C. (on S) . 

S is defined by simple closure iff only finitely 
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many S.C.C. define S. sit S2t t^«t Sn may be defined 
simultaneously provided th4j::a are no Infinitely desca: Ung 
sequences of definition. ^ ^ , I 

L 

Theoreto The class of simple-closure definable 
S is equivalent to tne class of context**f ree languages. 

I indicate the proof by giving the algoritnms for 
generating a set of S.C.C. given a cfg^ and conversely. 

1) Cr*G 3*> S.C.C. Suppose we nave a grammar 
G «<V,T,S ,P>. 
Then 9 let 

Mai. 

We are to define , by S.C.C., the clans r. corresponding 
to L(G). 

First, rewrite G into equivalent Cnoraslcy normal 

form 

G' . <V'.T.S .P'>. 

G 

Each rule in P' is of the form 
i) A -> 

or 

ii) A -> B C 

where A,B,C are non-terminals, and a is a terminal. 
(See Chapter 3.) For each rule of the form i), use 
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the S.C.C. 

a mem A ; 

for each rule of the form use tha S.C.C. 

(x mem B) ar^d (y tgy n C) tt^en x oi y roem A. 
It Is clear that 5 « L(G). 

2) S.C.C. ■ > CrG 

Suppose S Is define! by simple closure .'^ 
Then we need a grammar • 

r, » <V,T,S ,P>. 

G 

Let • « 

T a N 

and let S be a e|p^bol corresponding to tne class S 

^ G 

defined by simple closure. Then, If 
* mem A 

Is an S.C.C.^on A, for a e S.O., than let 
A -> a 

be m p. Since a Is an S.O., It Is a non-empty 
■ string of symbols in M. 

Suppose r € P.B.E. , or e S.T., and 
r then Of roem A 
is an S.C.C. rnen we reduce this according to che rules 
for P.B.E. If r is of tha form 
X roem 3 , 

then replace occurrences of x in or by d =»nd call 



r 
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4 

\ 

the result of thi^ replarement substituted for xj • 

Then 9 add to P th^ rule 

A -> [B substituted for x] 
If r is of the ^orm 

then perform any such risplaceraents of the variables 

in and into the variables in or • Notice that, 

since rule ii) for P.B.E. requ^Lres that no 

syntactic variable in occuif in (and conversely), 

there will be no problem in making this sa«>st;itutlon« 

Kovr^ add to P the following rula: 

A -> a [correct variable substitutions] • 
It is clear that tne above translation will, with tne 
appropriate proofs by induction^ yield the actual proof ot 
the theorem. 

Many of the elementary syntactical notions of the 
first-order predicate logic can be defined by simple 
closure; nsnc-a, by the above translation, ari equivalent 
context-free grammar can oe ootained* The sets of 
variables, predicates, terms, arU well-formeu forinulas ar^ 
examples. In practice it is cuscpmary to afssaiie an 
infinite class of variables, and since the abovu 
formalization of S.C.C. allows only a flriite class of 
symbols, some way of generating the variables^ e»g* usin^ 
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prime symbols^ Is necessary. The following defines the 
class of variables VAR^ assuming priniikive symbols v and 

( 

ly V mem VAR; 
11) X mm VAR Chen x* ro^n VAR. 
Ill) Nothing else Is In vAR • 
Infinitely many constants^ as well as Infinitely many 
predicates and functions of arbitrary type» can be 
generated by similar devices. 

While* the set of well*formed formulas WFF is 
defined by S.C.C. and is hence a cfl» che set of formulaic 
of a first-order * language ^ STCt;» xs not definable oy 
S.C.C. (3). AlsOt the class TAUT of tautolo<jie8 xs not a 
cfl# (Obviously, the class LT of theorems of first:-oruet 
loijlc cannot be a cfl 3iac3» by Cnurch's tneorero^ that 
\ class is not even recursive. It is less obvious that 

recursive classest such as the class of tautolOjies^ is not . 
a cfl.) 

The resuljis that srCE and TAUT are not c;:l can oe 
proven by use of a result known, as tne ^'uvwxy theorem"* • 
Theorem 2 (the *uvwxy theorem**): 



(3) A sentence is a formula with no free 
occurrences of variables^ where an occurrence is free if it 
Is^ln the scope of no quantifier binding tnat variable. 

o 150 
ERLC 
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Let L be any cfl. fhent there exist constants p^q 
depending only on L such that If there Is a woru z in 

with ts| > P (where \z\ is the nunoar of symbols 
In z) p then z nay be written as z » uvwxy^ where 
jvwxi <K and v and x are not botn e 
it ftinpty symbol) such that for eacn integer i >s 0, 

i jT 

uv wx y 
is in L (4) . 

^ 

This theorem limits the amount of context check! n^^ 
thac a cfg can perform; intulcively* it says that a finice 
number of sentences can be checked for context , but aii 
effort to check several contexts over an infinice clasn of 
sentence<3 will re<9Ult in some extraneous strings belovj 
accepted by the greunmar. Tha theorem makes it expixcic how 
to find such extraneous sentences* 

I will indicate how Theorem 2 is used by proving 
the following result. 

Theorem 3: The set of sentences of a firac-order 
language with a single iK^naUc predicate ? is not a cfi. 

(4) For a proof of the uvwxy theorem, su^ 
[Hopcroft-Ullman] ^ pp. 5\--52. 
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Supposa to the contrary that STCE oi: tne larigua^^ 
with one monadic predicate is a cfl. Then, for each 
naturvil number the formula 

J J J 

Z I Vv'(P(v')-*P(v')) 

J 

Id In L 9 since these are closod WrF 's. Let p,q oe 
the constants guaranteed by the **ttvwxy" theor^^m. Then, 
select a J such that 

U I > ? 
J 

and 

J 

I Vv • [ > q • 

Clearly* j Is a simple function of p,q; furcher, z ^ 
Is In L. 

This satisfies the hypotheses of tne ''avv^xy" theorem, 30 we 

know that we can rewrite z as uvwxy such that v 

J 

and X ire not both empty, and for each 1 >s 0, 
1 1 

uv wx y Is In SCTE . 

\ The key is to show that any way of dividing zj 

(using linear notation 
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for z sub J) IntQ segmants u^v^w^x^y will not avoid the 
•xtraneous mtroiuctlon of some non*-sentence Into SICE* A 
counterexample to the proof would be' one (nonempty) 
subsequence of zj that could be repeated Indefinitely 
without 9enerating a noii-senteiiCdg or a pair of 
subsequences that can be repeated togetner. ihe' 
subsequence consistlnj of the quantifier and its variable 
could be repeated indefinitely and still yield an STCE; 
however* ^ j was chDsea so that cne lengta of the 
quantifier and its variable would be larger than so 
this subsequence will not satisfy the hypocneses of the 
theorem. The only other repeatable subsequences are the 
strings of primes that make variables. Pickxng Jusc oat. 
such subsequence will clearly cause non-sentences to bi. 
introduced. We can pick two such suo3e4U3nce3, rejjt=ati.»9 
them together, 3UcK^ as tne followlnv^ divlsLo-^ ««ouXi 
Indicate: 

V V ' ' ( P ( V ' ' ) -^P ( V ' ) ) 

I XJU! lUI I 



But then 
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V V • ( P(v ' ) _p ( V • ) ) 



is in S^'CS, and it is a non^entance. 

The "uvwxy" theorem \^iustrates t!ie roiiowinL 
point: two counters (such aa the^ counters on t.i\e. number of 



147 



prlii#9) can be Rept tocjether by a cC<). But^ it there are 
three or raore counters p then each i^air must ke^t 
together by a different process In the (jrairnnar^ and hence 
some extraneous results are unavoidable. 
Notice that the set 



{ Vv'P(v') |1>«0 } 



is a cfl; the appro^^riate grammar ^ with r as the stari; 
symbol, contains the productions: 



(1,1) F WA) 

(2.1) A -> P(v 

(2.2) A 'A' • 



It Is interesting to note that ohis ^.raaunar oears iitcle 
relation to the seTiantlcs Hk.^ly to be jiven to the 
formulas in question. 

While the closed formulas of firstT-orcier lOjic do 
not form a cfl, it is well to point out tne sense in which 
this would no'- be a restriction on a semantical theory 
based on a context-free treatment of first-order lo^ic. 
The class WFF is a cfl, and we can allow tnat o.^ex> 
formulas are meaningful. The usual convention is to let an 
open formula be equivalent to its universal closure-- 1 .e. , 
the formula obtained by surrounding the given open formula 
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Wy universal quantifiers for each variable occurring free 
in the formula. (At le-ot one text, Introduction to Logic 
by ?• Suppes, uses the oaalogous existential closure * ) 

Kowever, there is a real sense in whicn Theorem ^ 
limits the power of any semantics based on context-free 
languages. The concept that I propose using is chat of a 
context-free semantics ; I shall say that a semantics 
defined on a language is context-free if it is computaole 
by a push-down automaton. The idea is chat we cannot first 
present a cfg give an arbitrary algorithm for computing 
the meaning of a sentence in L(G)t ^nd then claioi that 
the semantics itself is **context-f ree** because G is. The 
first-order logic is such an example: a semantics on, say 
niFFf must contain an algorithti for determining what cne 
free occurrencas ot variables in a formula are. Inx^ 
algorithm cannot toe represented oy a push-down automatorx; 
if it could, we could write a cfg for STCE, wnich ineorem 
3 claims we cannot do. Hence, the grammar underlying such 
a semantics for first-order logic must be 
context-sensitive. 

I think it is important not to consider tnis a 
limitation on the whole approach gi^ven here. xi 
admitted that natural language, witn or witnout c;aolex 
mathematical expressions, is not context-xree. xhis doos 
not preclude that there are large and u<?eful fra^rotints tnat 
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are context-free. Moreover, the experience ^^alned from 

working with context-free ^raroioars ma/ be easily 

transferred to work with more powerful classes of ^r2upnroars« 

A valuable point is that first-order lo^^ic can be 

put into the framewonc of generative grammar at all.^" 

Theorem 1^ while mathematicall/ trivial^ nas a 

philosophically important message in tne context of much 

current work in computational linguistics. As Suppes 

explains (5)t 

A line of thought especially popular in the 
last few years is tnat the semantics of a 
natural language can be reducea to the 
semantics of first-order logic. The cenural 
difficulty with this approach is that now as 
before how the semantics of che surface 
grraunmar is to be formulated is still unclear 
• • • how can explicit formal relations be 
established bet>ieen first-order logic and the 
structure of natural languages? 

(emphasis added) 



The difficulty of looking for first-order 
representations of natural language is not here consiciered 
to be tnat first-order logic is insuf f icieatly expressive. 
As I have attempted to stow, it is semancically more 
powerful than context-free grammars. I should oe nappy 
with any formal lancjuaga representation of natural lan^^uage 
(into even a programming language such as LISP or ALGOL) as 
long as there existed a powerful theoretical '*T:raa slat ion'' 



(5) [3uppes-2] , p. 1. 
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between the surface of natural languaoe arid tnp formal 
language. The superficial arguments for asin<j set-lancsua^e 
over first-order logic are those of cuscom datincj back tc 
Tar^ki, of convenience, and the fact that first-oruer logic 
has it5 semantics given in terms of set-languagc5. The 
-deeper reason is that first-order lajic can be aeflned by 
generative grammars (some roncepcs admittedly requiring 
context-sensitivity) , and so we may think of the semantics 
for natiiral language, based on generative grammar, as being 
amenable to the sec-theorecica 1 approacn that hag been so 
successful for symoolic logic* An intermediate pass 
♦^Hroucrh first-order senterces uoes not appear to bf* a aaln 
<n clarity or concept". 



III. MODEL STkOCTURES AND CFG 



Thp basic Ide^. behind any semaacic55 for a ctg is 
that tne terminal symools (-eni to) denote set-thGor«tlcal 
objects in the model structur^a. and tne r\^ lr\c of the 
rirammar (tend to) be interpreted by set-theoretioa a 
functions. In practice, th^re Is however cerCaln 
tradeoff between the denotaclonn ;ivpr; tne tf^r-^:-. ni * ^ 
^he functionr, associated with nhe rtil**^ — 
symbols are denotative and which are l^jica^. It se«no 
that a certain nu^h^r of philo*»n:^h'', ^ ^on>-ro*^or<?j - - 

Jo/ 
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bMn •ngenderad from this possibility of a tradeoxf . 

As an example, looking at the language LI for a 
fragment of quantifier-free aritnmetic (see Section X 
above), the following alternative model scructure II and 
set-theoretic rules can be given for LI , 



11 » < D, PLUS, EQUAL, IMP, OR, AiND, NOT, 0, 

1, TRUE, FALSE > 

where 

i) D = w U { TRUE, FALSE }, where w is the 

set of natural numbers; 

li) PLUS is a function from 0*2 into D 
(denoted PLUS: 0*2 — > D ); 

2 

iii) EQUAL; D — > {TRUE, FALSE) 
iv) IMP: {7RUE, FALSE }"2 — > {TRUE, FALSE} 
similarly for OR , AND 
v) NOT: {TRUE, false} — / {TRUE, FA-SE} 
vi) 0,1 , TRUE, FALSE s D. 
The following are the denotation rules, asaigniay 
objects in M to terminals in L. If a is a ijymbol 
or a sequence of symbols, let [ « ] be the deiiotation of 
a in tl . Then, 

[a] u EQUAxi 
[=>] m IMP 

[&] 3 AND 
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[v] s OR 
[-J HOT 
[+] 3 PLU? 

[a] = 0 . [d] = t 

• [(] = b . [)] , 0. 

Finally, I give the functlOiia corresponding co tna 

rules of the cfg G that generates L.1 , « 

LABEL RULE FU^CTlbK 

(1.1) F -> (T = D ( o I (a <[TJaT] .b^ € L=J } 
11.2) r -> (F =^ F) { b I (a<Lir'].LrJ.o> e .L=>J'i 
for rules (1.3) . (1,4) . and {2. 3) 
similar functions are requlrea; i 

(1.5) F -> ( -,F) { b I (3<[F].b> e L -J } 

The model ^ la somewhat unusual. The point In 
its construction was to make every lin^uiscically 
significant symbol have a fienotatioa knJ to eliminate thi 
notion of a 'logical' sjTnbol, Even tna j<dr»ii.th5«acss 
"denote", but since they do not play a ^art in t..* 
set-theoretical functions it is of no corisaquence, (vhe 
use of parentheses is. of course, to avoid a.ubi^uity.) hxI 
the work is Jone by the denotacions s^iven to the terminal 
symbols. The semantic functions simply say to apply the 
arguments in the appropriate manner, and thus have .lo r«al 
content* 

While it may seem arbitrary wnath^r t.ils oioJei is 
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used, or the more usual one given In Secwioa I above, the 
:iuestlon of which sywbols denoce objects Is a key 
disputation in much philosophical work. Tne Frege-Russell 
tradition of the ontolojical status of propositions is 
cased in, or at least permitted by, the formal plausibility 
of objects in a model that behave liKe propositional 
functions. As is well known, paradoxes creep inco somewhat 
richer languages than LI when semantical notions sucn as 
'true' and 'false' are given ontolo^^ical status. One 
solution is type theory with its hierarchy of propoaitioaal 
functions; but this is beyond tne limits of my discussion. 
Without committing myself to any posxcion whatever 
regarding the status of propositions, cne formal fac 
remains that there is an interplay b-atween v-ne denotation/ 
of the terminal symbols and the set-cneoretical funcuionc 
associated with the rules of cne grammar • 

As an exannple of the problem I would lik<5 to avoid, 
consider the noun-phrase 

♦) capitol of r ranee. 

♦) contains a prepositional phrase. A reasonable way to 
interpret prepositions is as some kind of functiou. 
Consider however two alternative grammars anu the semantic^ 
they off^r for ♦)• 
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Git dtD NP -> NP of NP 

(1.2) NP -> 

(1.3) N? -> PN 



plust of course, tha appropriate lexicon « 
functions corresponding to the rules of G1 

(Mi OF(CNP],[iMP]) ^ 

(1.2) the Identity function 

(1.3) the Identity function | 

i 

Then, the semantic tr^e for ♦) Is: 

OF NP: 0F( [capitol], [France] ) I 



Tne semantic 
are: 



NP: [capitol] 



of 



NP: [France] 



N: [capitol] 



FN: [France] 



capitol: [ capitol ] 



Frand'e: [France] 



i^otice that \he word 'of do^3 not denote; instead the rul :e 
(1,1) assumes \ a rather dubious set-theor«5tical function, 
*0f* is a logical symbol In ine grammar. An alternative 
grammar G2 ha«? a denotation lo£] in tne model. The 
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rules ofo G2 are: 

(1.1) NP -> NP PRLF N? 

(1.2) SP -> S 

(1.3) tjp -> p:^- 

with the appropriate lexicon; the semantical rules are: 

(1.1) [NP] n { a I ( a <a,b> € [PRE.^]) (b etNP])} 

(1.2) identity 

(1.3) identity 

G2 is to be preferred to G1 in that it maices 
clear a kind of ontoloaical commitment : " namely, that the 
information about the function associaced with the 
preposition 'of has to be a part of the moJel structure 
(which Is, in relation to Erica's linguistic behavior.; 
data base) and cannot he considered a part 
set- theoretical functions available (wnich correspond tc 
the machinery of l^nyuage processing) (6), it is .iiy belief 

that much of the talk about the ontoloyicai comi)fiitment oL 

I 

natural languages would benefit from an understanding of 

( 

this kind of a tradaDff. I 

Further, I think that this appears toyurr contrary 
to much of the talk abo the 'logic' of variolas words* It 
seems to' me that much of the talk about, say, | Ithe way in 
which modal notions (' believe' ,' know' ) are used has 
suffered from too little empirical evidence. Hence, if 1 
am uncertain about how a word functions samantically , i 
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prefer to make a commitment to an object in the data case 
representing that word, in the hope of collecting some hard 
data on the use of tho word. This ?ur« tne emphasis upon 



concepts, but only because I think the former has oeea 
overlooked. In the case of modal coi.cepts, a more 
complicated structure is needed than the one I have given 
for ERICA; I have triad to consider only the extensional 
case, leaving modal notions as transparent. Readers 



familiar with Kripke-KifttikJca-Montaijua semantics for modal 



(6) There is a better way of handling many 
prepositions, such as 'of and 'with', and that is to 
create a function by combining the preposition wito a 
phrase. In •), the appropriate combination is 



and the commitTwnt is to a function on J inappin.j oDjects 
(countries) into their capitols and givin^j some kina of 
error condition (say, by returning the nmi set aa tn** 
Capitol of non-countries) . 

In any actual implementation of a data oasc, I 
think this kind of approach would oe necessary in order to 
give a reasonable structure to the data, i nava not used 
this approach here, because I am simply too awash in data 
already. 



understauiding 



linguistic behavior rather tnan analyzing 




capltol of 



notions in modal loalc (oest thouynt of as an excension of 
first-order predicate logic) will realize that the 
possibility exists of v^ivinj tiore coraplex set-theoretical 
structures. 

/ 

IV. SEMANTICS rOR BKIJA 

The mo lei theory of the classical first-order ^ogic 
requires only a simple model-taeoreLical structure 
containing objeccs in the domain anu n-ary relations anu 
functions on the domain. Natural lan-juages re^^uire more 
complicated structures than f irst-or^er | ian^uavjcs. 
Following Suppes (7) I give the closure conaitxona aefiairi' 
the class H(d), based on a domain D. Tiiis Wi.il aiiov 
for any finite co.T»position of functions m tne natural 
hierarchy o£ sets but may be stronger cnaa any appiicatior* 
requires. 

Let D be a nonempty set. (In generax, D may be 
finite, for my purposes.) Then 1 define co be the 

smallest: set such thac: 

i) for sach n e v (tne set of natural numbers), 
D'*n €K'(D); 

ii) if A, B e H'(D), then A U 6 e H'(D); 

(7) Sea [3uppe3~2j . pp, 10-1 i. 



111) If A € H'(D), Chen P(A), pow«r sec. of A, 

13 in H*(D) ; I 

Iv) if A € k'(D) and B c A, then BcH'Cii). 

The denotation of a true sant^^ nce ^ill be a special 

I 

object TRUE, and likewise false sentence t denotes the 
object FALSE. X let 

K*'{D) ■ H'{b) U [TsUE, FALSE) . 

Since some utterance <i vdll In fact express cwo. 
"propositions" (see below), we need to allow ordered pairs 
of denotations. Hence, let 



H(D) . H"(D) U { <x,y> ! x,y c H"(D) i . 



159 



Set-theoreclcal functions are now associated wltn 
the rules of a ctq. Let G = <V, r, s, oe a cfg, and 

f a function on P that assigns to eacn p € P 
exactly one set^ theoretical function such that ik the 
right-hand side of p has n symbols, then t (p) has n 
argiiments« The arguments are to ^ applied to ff (p) la 
the same order as they occur I5 the rks of p (b)* Then 

G' = <V, T, P, S, # > Is a POtantiajllv denoting cfg* 



Notice that no rule can |uive more than 



one 



semantical functioW associated wit^n It. Snould I warit a 



(8) The explanation for the order 01 arguments 
requirement is to provide a fiat solution to a problem 
mentioned in [Suppes-2] • The problem can be summarized by 
noting that tv:'> or more instancies pf the same symbol may 
occur at differ : nodes of a tree and will generally play 
non-interchangeaule roles in the semantics of the sentence • 
To avoid labeling trees and reformulating the definition of 
a derivation accordingly, I simply require that the sytiiDOls 
on the rhs of a production p have their valuations 
applied in order to the set-thepretical function associated 
with p. Xnis creates rather aftrange functions (suc.i as 
converse subset), which 1 /ignore by usinj the standard 
set-theoretical terminology as metalinguistic 

abbreviations, assuming that all is clear. In any case, 
the program that I wrote to /do the work knows what is 
happening, but it is of jno conceptual interest to w,o 
through the thrashing of explicit definition on this yoint* 

The convention I use for my abbreviations is this: 
if a symbol occurs two or more times in a strins^, th«n "iha 
valuation of the string is written using the symbols with 
subscripts that refer to the order in tne original strln.^; 
if the order of the symools in the valuation is the same as 
the order in the string, then the subscripwS are aiiitted. 
For example, I write: 



[N LINK N] 



if il^] c [vJ] then TRUE else FALSE* 
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grammatical construction to have t>io or more semantic 
Interpretations^ I would proliferate rules In tne ^^ramroar 
accordingly rather than associate more cnan one function 
with a rule. Since a derivation is aseociated with a tree 
(see Chapter 4), this means that if a sentence is 
semantlcally ambiguous, then it is syntactically ambiguous 
as we?A# It seems desirable to mirror semantic ambiguity 
in syntactic ambiguity so that if a terminal-form is 
sera^nticallv ambiouoas (l#e.t has two or more 
interpretations tnat are not set-theoretically equivalent) , 
it is grammatically ambiguous as well* 

the conditions on H(D) that allow ordered pairs 
of denotations need some explanation* Ofcen* the it^ost 
reasonable approach to tne semantics of p\terance is tc 
believe that it expresses two \or perhaps more) 
propositions* For example, consider the question ^ 

Did you go or did you stay? 

Clearly, this is two separate que<itlon3. Answeri iq "^s' 
to the utcerance (a favorite te^pons^ of the logical '.y 
sophomoric) misses both the intent of che quest^ioner 
the logic of the question. Wnat i*^ needed Js someth).:'' 
like an ordered pair with the elemenrs corresponding t.^- 
two separate questions (9). F^r such utteranc^s^ it 1 1 
not be satilsf actoxY to suggest two alternative SKjr.-\tXi 
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analyses. The notion of alternative imp^es that, while Wc^ 
have two or more possibilities , only on;^^ Is correct and to 

be acted upon. The idea here is ra^f^^er chat tne utterance 

// 

conveys two separate packages of iniforroation. 

In the grammar GE1 , there are five rules that 
have associated functions Using ordered pairs of 
denotations, rules (8,4), (8,5), (8,6), (d,1l), and (8,15). 
Table 1 gives the terminal forms using each rule. ^^^..J^ 
most plausible that rules (8,11) anjJ^.^^r^-ST'^snoul not be 
generating a oair of denotations, since there isr evidence 
in the ERICA corpus that the utterances these 
terminal-forms represent are simply repetitions* However, 
I have left these rules in the graumnar since it is tne more 
general case. 

The full generality of the closure conditions on 
H(D) are not realized in E'kICA, since the termifial-f orr»i3 

requiri 3 paired denotations all have an afrirnting or 

If If 

negating word as one of tha propositioas . 



(9) A large part of the Informal work thac I di<^ 
with the ERICA corpus concerns the question-answer pairs; 
it is from this subset of EnICA that the clearest view of 
the interaction between speakers arises, so I nave asked if 
the semantics handles these interactions correctly. I plan 
a later paper on the semantics of questions with an attempt 
to predict the answers, syntactically and semantlcally. 
Unfortunately, the E&ICA corpus is a little small for this 
analysis, but at IMSSS at Stanford W3 nave a larger corpus 
that is being collected under conditions experimentally 
superior to those used in ERICA. 

er|c 
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^^dLF 1 

TERMINAL-FORMS IN ERICA REQUIRING PAIRED JENOXAnOMS 
RESCANNER MODEL OF LEXICAL DISAMdlGUATIUN 



RULE: (8,4) 



s -> nerj a 



FREG 


TERKINAL-FORM 


20 


neg 


n 


8 


neg 


persp link neg 


6 


neg 


adj 


6 


neg 


pron link art n 


5 


nog 


art n 


4 


neg 


mod persn v persp 


4 




n n 


3 


neg 


adv 


3 


neg 


persp V neg 


3 


neg 


V 


2 


nea 


adJ adi 


•> 


neci 


pron link n 


2 


neg 


persp link n 


2 


neg 


persp link arc n 


? 


neg 


pron lln< art ajj n 




neg 


persp mod neg v prep 


2 


neg 


qu n 




neg 


adJ n 




neg 


adv ad J 




neg 


art n conj art n 




ueg 


rood persp 




neg 


mod persp v pron 




neg 


rood persp v prep parap n 




neg 


n V 




neg 


n pn 




neg 


n neg 

n/pn V prep pronadj n 




neg 




neg 


pn 




neg 


prep qu n 




neg 


persp V n 




neg 


pron link 




neg 


prep persp 




neg 


prep padj n 




neg 


persp V pron 




neg 


p*rsp V art n 


1 


neg 


persp V adJ n 



ERIC 



J on 



1b3 



TYPES =t 



nen persp lin:< pn 

neg persp v persp 

nag persp mod neg 

neg prep pronaJj n 

neg proa conj pron 

neg pron lin:< pn n 

neg persp link adj 

neg pronadj n aux v 

ne^ persp aux neg v 

neg persp mod v persp 

neg pron link pronadj 

neg persp v persp pron 

neg persp link neg adj 

neg persp link art pn n 

neg pron link neg art pn 

neg persp link art adj n 

ne^ persp mod neg v pron 

neg persp aux v prep persp 

neg pron link pronadj adj n 

neg persp mod v prep pronadj n 

neg persp mod neg v pron prep pron 

neg qu 

neg v n 

neg v pron 

neg v persp prep 

61 toke::s = 120 



STARRED r^v;RMS HA J rv;0 IREES 
USIJG THIS RULE r^CE. 



VJiTH EACH TREE 



RULE: (8,5) 



afi: a 



11 


aff 


oers? V 


9 


aff 


persp mod 


5 


aff 


persp link 




aff 


mod persp n 




aff 


n 




aff 


pron link 




aff 


persp link adj 




aff 


persp link art n 




aff 


persp mod v persp 




aff 


prep n prep persp 


TYPES 


= 10 


ro.<ENS = 32 



RULE : (8,6) 



s a aff 
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/ 
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1 parsp mod neg v n aff 

1 V aff 

TYPES m 2 TOKEiJS s 2 



RULE: (8,11 ) s -> aff aff 



42 aff aff 

TYPES m 1 TOKENS = 42 



RULE: (6,15) 8 ~> neg nag 



5 neg neg 

TYPES a 1 TOKENS « 5 



\ 
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Usually the basis for the recursion into d{u) is 
provided by a function v on the set of terminals T. li 
a e V+, let v(a ) be denoted by Lor J » , as an 
abbreviation. Tnas, terminals denote. 

Strings of terminals and nonterminals "denote" in 
the sense that tna basis denotations of terminals together 
with the semantical rules on the yranimar genarace a 
vailuation. For exaTiple, in the language LI , the formula 

») (((0+1.) = 1) => (0 = 0)) 

"denotes" its truth-value (TRUE), determined oy followin-j 
the semantic tree for *). 

F: TRUE 



F: TRUE 




T: [0] + T: [ll 



=> 



T: [1] 



1: [1] 



F: TRUE 



T:[0] 



T: [0] 



0: [0] 



0: [0] 



0: [0] 



1: [11 



ERIC 



172 



166 



I shall write, again a*? an abbreviation, 

C((((0^1) M 1) >> (0 = 0))] « TRUE . 
There is, ha^iever, a distinction that snould be made her«», 
namely, between a cleaotation inacie on a ' (strinvj of) 
symbol(s) by a basis assignment, as o^po6<^(l to the 
valuations generated by the rules of the granuoar* I say 
that the former ia a basis valuati^oa » If the basis 
valuations on a potentially denoting grammar G into a 
model II are all on the terminals of S, then II is 
said to be a uniform mod^l for G. 

My model for the semantics of Ba\ICA is ext^ressly 
not uniform, since I wisn to make some basxs valuations on 
two terminals. The problem arises with verbs that cake 
prepositions as a pare of tho verb itself, especirilly wh<2re 
the verb may be separat^id from the propositioa. 

Let t1 , t2 € T. Then t^^t^ meaas the string 
consisting of t1 and tZ, with # acting as a soacu 
marker. For some such combinations of t^^rminals, chere is 
a basis denotation. Such terminals are the separable veroa 
together with their associated prepositions. Without 
requiring that the parser be context-sensitive, 3p<iciai 
set-theoretical functions as^sociate with .he rules that 
generate the terminal forms where these separable verba 
occur. Surh a set- theoretical function ia a non-^uniform 
function . In the grammar 0E1 (see Caapwer 4), where ar<s^ 
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two rules using non-iinWotm functions, (3,6) and (4,33), 
each having it? own associated function. Table 2 lists cha 
terminal forms (from the rescanner lexical ciisamoijuatioo 
modal) that require these rules. Each termmal-f or-n in 
Table 2 is grammatically unambiguous relative to GE1 . 



\ 



ERIC 



17i 



1 68 



TABLE 2 

SENTEKCES GENERATED BY RULES REQUIRING NON-UNIr'ORM FUNCflONS 



RULE: (3t8) vp verb up pr<*p 



14 persp V per^ip prep 

12 persp V proaadj n prep 

10 perap mod v persp prap 

6 persp mod neg v persp prep 

3 n V persp prep 

3 persp V art n prep 

3 persp mod v pron prep 

3 persp mod ne^ v pronadj n prep 

2 mod persp v persp prep 

2 persp V n prep 

2 persp V pron prep 

2 persp mod neg v art n prep 



art n v persp prep 

conj persp v art n prep 

conj persp v pronadj n prep 

conj persp mod v art n peep 

conj persp mod neg v persp prep 

int pn V prona^U n prep 

int persp mod v persp prep 

n mod V persp prep 

n persp mod v persp prep 

n V n prep 

persp V qu n prep 

persp mod v art n prep 

persp mod neg v n prep 

persp mod v pronadj n prep 

pn V n prep 

pn V persp prep 



TYPES 3 28 TOKENS « 78 

RULE: (4,35) a vui sub J pra^j 



34 V persp prep 

5 V pron prep 

4 V art n prep 



3 V pronadj n prep 
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int V parsp prep 
tnod neg v {_j r on ad/ 
neg v persp prep 

V n prep 

V pn prep 

V prep pronadj a pre;) 

V qu n pre? 



11 
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There are In ERICA 39 types representing 131 ccic^ns 
that reiulre that two terminals nave a l^asls valaatxbnx 
together. Noo^-unformlty of a model ^ 11 could of course 
account for tha pheno-'nenon of attrlbutivltV e such as the 
phrase **alleged dictator** » but I don't find any greac need 
for this In the ERICA corpus. 

V. SEMANTICS FOr/gEI 

Most of the lexical c^&tegorles ^iven In tne 
dictionary have a specif led M.nd of valuation in H(j)e 
Since I have tried to use simple' semantic fuiiCtions for 
EKICAy a certain complexity is placeu uponj the basis 
valuations of the terminals. I. think this is desiraole 
because it makes an explicit commitment to tne information 
that is in the **data base** (Erica's perception, her memory , 
the physical surroundings of the conversation). Also, it 
gives us a feel for the adequacy of simple functions ^or 
the semantics of natural language. 

0£ course I I cannot give the basis valuacions oi 
the individual words , as they would be spelleu out lu a 
data base dealing with a specific subject matter. Rather t 
for each greunmatical category^ I can indicate the Kina of 
object in the structure H(d) that is appropriate. 

A. NOUNS I PRONOUNS I AKD ADJECTIVES 

/ 

/ 
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The following granvnatical catagorlas nave simply 
subsets o£ the do r ojiin as their basis valuation: 

adj 
n 

padj , f 

persp , 

pn 

pron 
pronadj 

/ 

The^d are the nouns , pronouns » and adjectives. Some wordiB/' 

7ch as proper nouns* denote one objects Thus, the ^>K>rd 
Srlca 

Just refers to the person Erica. Dy tlMt^ cne denotation 

[Erica] of the word 'Erlc^>^ will be a singleton set 

containing the element ( . 

* 

Erica. 

This should cause no confusion. Witn this convencibn, tAc 
semantics Is simplified in that the deiiOtation of a noun or 
proper noun will always be a set of objects; the semantical 
functions assume this* 

This group dominates the corpus. LooKin^ back tc 
the data on dictionary construction, Table 9 of Cnaptar 3 
shows the (rela^tive) numbers of words witn cne variou& 
lexical classifications. I summarize Uiat data below: 
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WORDS T;-iAT TAKE SUBSETS OF TKE DOMAIN 
AS THEIR CLASSIFICATION. (10) 
(ADJ, PADJ, PERSP, ?N, F'.Oi^ PisOiMADJ) 



ENTIRE ADULT EkICA 
CORPUS POiiTIOK PORTION 



TOTAL TYPES 3,490 3,135 ,039 

TAKIWG SUBSET 2,411 2,169 1f3|b9 

PERCENT 69% 69% pQ% 



Hence, by types, 63 percent of the words in ERICa take th^ 
subset denotatation according to this model. 



3. VEaBS 

There^ are four kinds of verbs in th-^ 

lexicon: 

aux 
mod 

plus the forms of 'to be' that are classed as 
link. 

There is an i-nportant semantical difference bec^een the 
forms of 'to be' ancj other verbs; I uiscuss the other veros 
first. \ 

The problem wich verbs is tnat they take ohj^ct^. 

(10) These, and other Iflgures of tnis icxna, are 
computed from Table 9 of Cnaptar 3. When a contr actio., is 
encountered In that table, If one of the Jytnixjl* in fih^ 
contraction Is the desired symbol it is aaied io 
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More Importvitlyy tne same verb will sometimes take 0^ 1^ 
or 2 objects* Consider the (fictitious) examples: 

1) Z am reading. 

11} . John is readimj the book* 
ill) M^ry is reading a blind man the Bioie* 

One semantic approach 13 to view 1} and 11) as elliptical^ 

in which case 9 the semantics might have to account £or the 

suppressed arguments to the [read] predicate* 

An app oach that makes less commitment in this 
direction is to let the semantics of a verb be of the form 

A U B U C , • , 

where A c 0^ B c x>^2^ C c iTz^ 

and D iv^ the domain of the model* A purely intransitive 

verb (e*g*9 'to run') has BaC«0* A verb that always tak^ds 

one object has A^^CaO, B^^O* Most transitive ver^^s can 

have 1 or 2 objrcts^ and in this case A-Oy B ^0^ C^^O. 

Tne more general pase is of a mixture* 

Again referring to table 9 of Chapter 3, I give the 

sums of the types that have one of^s^ these three 

classifications: 

aux 
mod 

V . 

./ 
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WORDS THAT ARE VEr^BS ISH THE ERICA CORPUS DICTIONARY 
(LEXICAL CLASSIFICATION aux OR mod OR v) 

ENTIRE CORPUS ADULT ' SRICA 

rOTAi* TYPES 1,490 3,135 2,039 

TYPES AS VERBS 89^ 513 812 

% AS VEKBS 2^ 25% 26% 

It Is possible to allow verbs to have a large 
number of objects, either explicitly or Implicitly^ 
Indicating time, place, other personal objeccs. I nave 
avoided this for tne present. 

Verbs classified as LINK (forms of 'to oe') are not 
Included In the above since 1 have conalaered Ln ^ra 
logical symbols and used semancical rules ^ccor. .inc 
LI^;K in a terminal-*f orm si canals the use of t?ie sur,^-:. 
function. For example, the terminal-form 

/ 12 pron link n 

has as Its valuation 

IF [pron] c [nj TyiEf.^TME ELSZ vAj^SE , 

and , 1 i jcewi se , . 

\. 44§ [pers? V prVO - 

IF [perspj c { a I (3<a,b<> Lv] ) { b € Uron] ) } 
^ THE;; TRUE ELSE VALSE . 

(This notation is i saci "the terminal for:-n 'ptir:?p v m ,.. , 
with 44 occurrances, has as its valuation in niu) ... 
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Notice tnatt if 'persp* refers ('refers' used 
informally) co only one object^ then allowin«^ the 
denotation of 'persp' to be the singleton means ch^t subset 
is still the correct semantical function. 

C. QUANTIFIERS AiiD ARTICLES 

The implementation of quantifiers and articles is 
certainly the most important part of the semantics to the 
philosophically ii«clined« In fact^ it is a)y suspicion that 
a logician will Judge a theory of the sefmantics of natural 
language most on the ability of that tneory tc handle ant^ 
coordinate quantifiers. 

My theory will not satisfy many in this re^jard. I 
have not tried to develop a theor^ that will account for 
much mathematical langua^^e at all. On the basxs of Theorem 
3^ I suspect that context- sensitivity is needed for tnis. 

The rules of the -jrammar GEI that: i»|troduce 
quantifiers and aryix:les into sentenc^^s make us^ of tne 
semantic function QUANTIF. QUAt>jrit ±s a function /of two 
arguments^ which are: 

1) the denotation o^ the article or quantifier; 

2) the denotation of the phrase beioq modified.N 

. ^ For example 9 the rule v 
(17|5) npqub -> quart adjp nounp 
introduces quantifiers and articles into noun phrases. 
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(See the granimar GZ1 in Table 3 of Cnapter 4. j The 
semantic function for this rule is 

r 

.QUANTIr ( [quart] , ( Ladjp] n [aounp] ) ) 
(whereir. we use the syml>Dls on the right-hand sivie of the 
rule to indicate the application of arw,umenus) • The 
fleroantic function QUANriir' is defined in this section, and 
it depends not only on the denotations of tne words, but 
also on tne words themselves--*! • e. , which quanticier or 
article was present. However, the function QUa^sTIF 13 
still a part ox a context-free semantics i xn tnat the 
^valuation returned by QUANTIF does not depe*ad upon the 
cpnteJ^^Qf the . phrase Jai--tJie- sentence r . 

I now indicate tne denotations of tne various 
quantifiers and articles, wnere appiicaoie, and K.Yi^ 
algorith'n^ for comtJUtl*:»j the functioii QUAi>»rir. 

/ 

1 . CAiOINAL NUMdEKS 
Most of the cardinal numoers lalSB tnan ^iU occur la 
&RICA. (Recall that^ cardinal numbers are ciass«?i . a^ 
'qu\ ) Most of the U3av^es are trivial, as for axampia la 
counting exercises, I give cardinal uU'^Voers denocatious 
reminiscent of the Frege-rcu^sell treacmant ol rne rowion o;: 
cardinal, although simplified. Tiie maLnou l;i lat a 
cardinal n be the set oi all sets oi J ot caiv^inaii^^y 
n . For example. 
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[one] > {x 6P(D):| |x| = 1} 

[two] = {x eP(D)| |x| a 2} 

[three] » (x e P(D) | |x| = 3} 
Notice that no use Is made here of any sort o£ hierarchy 
despite the fact that a more cornplex use of languaga than 
that found in ERICA ml>|jht require It. Consider the 
sentence I Two groups of girls were present* The 

reasonable denotation [t»yo] would have to include the set 

\ \ 

{ X |( a ypZ € X ) ( y^z e D) \ 

A ( V w € x) (w=y V w«z) } 

When the quantifier is a cardinal na-nber, the 
valuation given to QUANTIF is ^iven by 

QUAiJ4TIl ( [cardinal number] > [a strinj] ) a 
[cardinal nutibeji-] fl P([a string] )• 

Henc€i> for the phrase 'two pretty girls' we obtain 

QOANTIF( [two] ^ [pretty girls] ) = 
QUANTIF( [cwo] ^ ( [pretty] n [girls])) = 
[two] n.P( [pretty] n [girls]) • 

This gives us the class of all two-elemeat sets of pretty 
girlse 

Such noun phrases as 'tne two pretty .jirls' do no 

occur in ERICA; however^ I indicate now to handle ca^se 
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phrases in the next section* 

2. THE DEFl4>IITE ARriCLE 
The definite article^ 'the', occurs at least once 
in 358 sentence types, representing 377 toKaas, among the 
9,085 tokens in ERICA. Uses of 'the' can be classed as 
demonstrative and intensive , where the former serves to 
distinguish an object while the letter seem co do little 
semantically at all* So^rte examples of the actual sentences 
follow. 

DEMONSTRATIVE USES OF 'the' 
FREQ SEI^TENCE 



3 to the zoo we went. 

2 in the water. 

2 in the castle. 

2 put it on the microphone. 

INTENSIVE USES OF 'the' 

FREQ SE4«TSNC£ 

2 i lost the other one. 

1 all the clothes. 

1 and the soldiers will co^ne. 

1 all the shapes. 



This distinction is certainly not hard ard fast, but znakiu^ 
it tends to point out U $ degrees o£ semantic import the' 
word 'the' has. 

In the classical theory of definite descriptions, 
the word 'the' is treated as an operator picking out the 



ERIC 



1 79 

object uniquely possesslnrj a cartam property; the 
classical exarople i*', of course 

♦) Scott is the autnor of v/averiy . 

where the phrase 'the author of Waverly ' denotes Scott 
uniquely* The logical forni (11) of tnls senteiice is 
something like 

8 * (iota x) W(x) 
where s is the constant denoting Scott, W(x) is xhe 

predicate for 'x wrote Waverlv ^ . and iota is the definite 

/ 

description operator * J 

Looking at the usage*? of the word 'the' in ERiCn 
suqgests a more Complicated notion or descriptiOii. nearly 
10 percent of the U3aqe<^ of 'tne' occur wltn plural noan 
phraser, such as 

The tapes are going around • 

Plurality coald, of course, be accommodated 

picking out a distin juished set, which may have nKDre than 
one elaraent. The cla*?sical tneory of definite description 
has usually been stated only for predicates that are tru^ 
of one object, but the extension to s^4,^ is an obvious one. 

(11) I am somewhat unhappy about u^lng the phra 
'logical form', since it ma/ evoke many things beyond w..;. 
I intend. I use tne notion informally to mean trte 8.5r*te5.c\; 
in fir3t-order logic, with set notation, that woal.i he i- 
representation for the given £ngli»/i aenceace* I 
nothing more formal In mina tn:^n the c<\\K about translatxn, 
ordinary language th^t Is cnj«5tom2^^y elementary 1^» ] ^ ^ 

courses. 
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My inspecclon of the uses of 'the' in bRICA leads 
ne to believe: 



1) it is clear that the i^nrases usin^^ 'che' ar« 
perfectly clear to Erica and her conversant^, so iiochiug 
Very strange is happeninv^; 

2) tne word 'the' is doinij sometnin^^ ic has 
semantical import, and is not always there merely for some 
kind of syntactic filling, as 1 had suspected ^ni^jht be the 
case; 

3) while 'the' is picking out a distiiiguianed set 
of objects y it is not clear that many phrases mi^nc be 
simultaneously meaningful, such as: 



the man 
the two men 
tne five men 

the three most hanJsome men 



To countenaunce this in a Uieory that extends che 
classical theory of descriptions, I suggest the xiotion of 
contextij^^ orderlnqs ^ ^ 

The first semantical concept I offer is tne notxon 
of tne set^ IMKEO , the set oz objects of immeaxate 
importance to Erica. The initial reason for offering uhis 
is that many of Erica's utterances are elli^ticax auu 
assume a limited domain for much of the conversation. Ox 
course, the conversation iiay gradup.iiy cn<in^e in topac, and 
when it does, the domain of immediate imt'^rtanCvi! will 
change. Language provides for ways of ""cnanginy zhe, 
subject", for example, by usin^ proper nouns to brin/ new 
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objects to the forefront of tne conversation. 

Tne set imEJ is the contaxtual £;araneter In my 
sa-nantic model that contains the cninc^s of concexuual 
interest or concern to Erica. ^ne assumptiOi* is that 
careful examination of the context of utterance^ the 
physical surroundings, and the notes of the adults would 
enable us to estimate this parameter at any ^iven time and 
to account for the ways that objects are added to and 
subtracted from I'^^ED. I thini^ that it is noc as iar^e a 
set as one might suspect. 

The need for a contextual parameter in the 
semantics is illuscrated by looking at various pnrases xn 
ERICA and noticing that the same phrase will - ap^^ear to 
denote different things in different occurrences of the 
phrase. Notice the occurrences of the noun phrase 'the 
water' in the following utterances from EkICA. 

SOME OCCURRENCES OF THE PilRASE 'the water' IK ERICA 
FKEQ UTTERANCE 

3 in the water. 

1 he goes in the water. 

1 he spilled tne water. 

1 ' lookat the water. 

1 that's the water and let me go in tnere. 

Looking at the contexts, it is utterly imt^lausible 
to believe that the same object is denoted throughout. 
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Hence I the need for a contextual parameter. 

I will define IMMED from tne set INMEi^l. Let 
IMME01 be a subset of the domain u. Tiie Interpretation is 
that the elements of IMMED 1 are the objects of Importance 
In the conversation (at a given time)* 

Let R be a binary relation (ordering) on the set 
IMMEDI satisfying the following properties: 

1) TRANSlTIVirYx if xRy and yRa then xRz, 
for x^y^ze IMMED1 ; 

11) CONNECTEDNESS: xRy or yRx, for x,y€ imhKjI ; 
Thus 9 R Is a weak orderlncj. One of the rev^ulrements, 
connectedness, may be too strong. Intuxtively, x << y 
means 'x Is at least as importanc as y'« 

Based on the structure ^iven to iHMEOI i.e- 

orderlng R (which may present a lot lOf structure, or very 

little), I want to include cercam subsets or IriHEi^l in 

IMMED* Perhaps I can motivate this oy the claim chat I 

think the following phrases may all be maanln^ful: 

the men ^ j 

the man ' > 

the three men 

while, at the same tine, 

the two men 

may be meaningless, or at least sufficiently unclear ar, to 
require a 'HUH?' from the listener. My claim about a 
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conversation, such as the ERICA corpus, is thac at eacn 
moment in the conversation th«re exists a set of objects 
IMMED1 together with tne relation R, wnicn inuuiLively 
means the relative importance of the objects in irlMEDI • 

It is now possible zo define IMMED from IMMED1 and 
R. Actually, I want to define IMMED relativized to some 
set T, so I define first the set IMMED(r). Then, IMMED = 
IMMED(D), where D is che domain. 

Let IMMED(T) be the smallest set sucn tnat 

1) (IMMED1 n r) C IMMED(T); 

2) if S C (IMMED1 n T) then 3 € IMMED if 
and only if / 

( V X e ilMMEDI ll - S) I Y y € S) 
[if xKy then not yr.x and 

, if yRx tfien not xKyj . 
I shall call such a set S a clea.i section of I MMED 1 
r elative to r. 

Thus, IMMED contains che oojects of contextual 
importance IMMED1 , together with tnose , sabse'cs of IMMLJ1 
that can be determined by the ordering^ suojeicc to the 
Xvjuirement that a subset must ba neatly ^delineated u^' tru^ 
ordering. ^ 

It is now possible to gtvo thp /algoriUifn tot th^ 

/ • 

semantic function c;u^^'^I^ in tr*'^ i'la.^^ tiiuu tiie ar:lcxc ^- 

\ 

the word *uhe\ It is ^ cc iltaor^an - ^^ros • ^ uic^^ij- 

/ 

/ 
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a series of evaluations to be attempted. 



) 



QJAJTIFC [the] p [<expre3sloa>] ) 



1) If <expressiori> Is syntactically sinvjular, 
and there is a slnglecon set S la 

IMMfcD( [<expre8slon>] ) Isuch that S c [<exprasslon>J » 
then evaluate to: S;| 

£LS£ 

2) If <expre88lon> Is s/niactlcally singular, 
then there Is no evaluation. 

EI«S£ 

3) If IHMED1 n [<expressloii>] Is not null 
then evaluate to: IMMED1 D [<expression>] 

ELSE 

4) if <expresslon> contains \a cardinal numiderp let s be 
the slzB of the elements of [<expressloa>] ; tne^^ 
[<expre8slon>] is computed b\ 

CUANTIi ( [<cardlnal>] , ^expression 2>] ) 

for some ^expre^sion 2?. if there is a unique 
sat S € IMMED( [<expre3Sioa>] ) \ such chat |Si s: s and 
S c ^expression 2>J , then evaJ^uate to; 

ELSE 

5) the expression *) does not evaluate* 



As an example, consider jthe phrase 

the five men. 
Let liWEDI a {a1 ,...,a15tt}, 
and [men] = {a1 , • . • , a1 5, c^d} , 
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and let the relation k be given by una following diagrani 
(where tne higher elements are more important, ai^a elements 
on the same level are equally important.)^ 

a1 

a2 a3 a4 
a5 a6 a7 

a8 t 

a9 a10 a1 1 a1 ^ ^ 
a1 3 a14 a1 5 / 
We r-astrict the ordering tp [men] {[five man] would be more 

correct — this would / r^uire some added comt^laxity of the 

/ 

above- conditional tunTtton). This rernove<^ the elemtinc t 
from conslderat iony^^ t'nv* only 5- eleme:>i. clean section x-> 
the set ^ 

{a8,a9,a10,a1 1 ,a1 z} , ^ 
and hence, tnat Is the denotatio^i of cno ^-^nlrase ' t^r^e tive 
men' • 

The phrase 'the meri' dex'iote:? tne set 

{a1 , . . . , a1 5} . 
Since L^i^ze is do 2-t'3l«?ment cl'3an section, the phrase 'che 
two men' doe<3 not denote. The phrase 'the m^n' solaces two 
1 -element clean seculons. The above al^oritnm says tnat it 
therefore does not denote. Alternatively, wo mic^hc Stjlec 
the highest clean section, and le*: 
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[the man] a {a1 } 
which is intuitively correct* 

Notic^ that the al^oritim ^ives the classical 
results of the theory of definite description vfhere 
applicable^ yet the theory is extended to include other 
sets as well that are a part of natural discourse. 




3. TOE INDEFINITE IVgTICLE 
When the quantification theory of predicate logic 
is applied informally to natural Ian^ua9es9 the existential 
quantifier ' is often used to represent the indefinite 
articles 'a' and 'an'. These words^^ojxiur somewhat more 
frequently in ERICA than the definite article. 



\INDEFIKirE ARTICLES IN EkICA 
TYPES TOXSNS 



) 



a 788 857 

an [ . 15 16 



/ • 

/ 

These wor^s modify singular noun pnrases exclusively. 
Presumably »\ ^ • 

[a]\ [an] 

so I will identify the two forma of the indefinite article 
and talk only about 'a'. In about one-*third of tne cases, 
'a' points rather non- specif icallyi as if to s.i> some 
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singular but -unldentlf iecl, perhaps unfainiiidrt object* 
Such cases Include: 

1 there's a farmer in there. 

1 ' thope are for a boy. 

In many other cases (perhaps as many as iiOO) the wora 'a' 
functions" as a Kind of generic pointer, meanin^j "son tniny 
of this Kind or satisfyin^i these properties**. Examples of 
this Include: 

2 1 want to read a hooK* 

1 you are maKlng a .lOUse* 

When Erica says 

i want to read a book; 
it is plausible th^t she is thinKlng of the criteria that 
specify "booKness", rather than a class of books 

(12) The treatmant of semantics harexn considered 
is extent ional. Without involving mysalf in a discussion 
of modalities de dicto and de re, X would liKe to remark 
that there is more than a little modality ir* Erica s 
speech. 

One solution that has occurred to me — o^ia cnat 13 
reasonably consonant with sat-tneoretical semantics — lo to 
have essential objects in the data structure (ontoiOK,^ , if 
you wlTiTT In this way, the denotation of tne phrase 

a book 

could be an essential book. I am tempted co recomrodnd this 
as an explanation for linguistic development of cuil^r^n* 
Perhaps there is a confusion between properties and 
objects, and the child, in learning ^ cluster of 
properties, reifies them. Or perhaps parents foster a 
realism upon the child (one that they cnemseivas have 
discarded) to facilitate learning- the difterence between 
oranges and pears. 

I think this is somethin^j to consider in examining 
the semantics of children^ lanyuages. 
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The most straightforward definition of QUA»rxc, 
when the article Is 'a', Is 

QUA:iTIf ([a] ,[<expre93lon>]) « 
IMMED n C<exijre8slon>] 

This seens to work rather well in casos wnare the article 
ocejrs in tne pre<^lcat9 of the utterance. For example, 

* 

[I'm a big girl] » 

if [1] = (IMMED n ( [bio] n [girl] ) ) 
then TRUE else FALSE . 

The gjlranmar GE1 is deficient in regard to the 
semantics of' many phrases containing 'a*. There are 
approximately 100 utterances in ERICA that contain 'a' xn 
the subject for which GS1 , as it stands, gives tne wrong 
semantics. Consider the utterance 
1 a boy ha3 that one. 

The logical form of this utteiance is sometnln^ liko 

(ax) (x is a boy and x had that one). 

The-^rules of GE1 simply check to see whether or not the 
subject is a subset of the predicate. Hence, we have 



E VAI^UATION ii 

if [a boy] 



(continued) 
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c { X I a<x,y>e [nad] (ye ([that] Q LOue])} 
then TkUE else FALSE. 
Clearly, no denotation [a boyj makes tnis jjlausibla. 
Instead, we need to change the rules for GE1 to cneck for 
*a' in the subject, in which case we coulrl nave soauathing 
like ^ 

E VAmRTIOH , _ 

if [a boy] n 

{ X l(a<x,y>€ [had] )(y€ [that] n Lone])i 
ji 0 then TRUE else FALSE. 

\ 

Some adiU clonal rules (perhaps several dozen) need 
to be added to GEI to yenerate sentences wherein tlvs 
subject Is modified by the indefinite article; the 
appropriate semantic functions can chen be associate i wiS:h 
these ruleq. 

4. TriE UNIVERSAL TJAi^TIFER 
The vord 'all' occurs in 100 uttera»ice types, 
accounting for 128 toicens. For siraplicity, I let 

CUANTif ( [all] , L<expre*'3ion>] ) » 

4 

[<expression>J 

as opposM to, say, rastrictincj [<3xpression>] to the sec 
IM^EDle This appears to work in about 75 i;>ercHnt of the 
cases. The remaining 25 percent use tne word 'all' in the 

er|c 
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sense of ' completely' p as In 
19[the kitty all greenj =r 

if [the klttyj c [yreen] then TRUS else tA^-Sii ♦ 

This is rather strange; it says that the i^itty is a i^rean 
thing, rather ' than the stronger intert>retation of oeing 
completely green. I take it that these cases use 'all' as 
an attributive adjective rather than a quantifier* 

This use of 'all' occurs in ERICA only when 
<expression> is an adjective phrase » so the rules ror 
QUAUTIF could be modified if I were williav^ to handle 
attributive adjjaertlvaq, which I am not. dowever, chi^i 
would give the wrong result to 

1 ) men are all mortal* 

which presuTiaoly ha =5 the same rneaning as 

2) all men are mortal* 

and tnerefore, 'all' is not attributive in 1)* 
Some utterances using 'ail' follow* 



6 all gone* 

6 It's all gone* 

4 he's all black* 

4 It all gone* 

4 they're all gone* 

3 all finished. 

3 that's all 1 got* 

2 all up* 

2 all 1 have. 

2 he's not all black* 

2 i all finished* 

2 tney're all gone* 

1 it's all gone* 

1 'cause they're ^11 gone* 
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1 all welU 

1 all yone? 

1 all mlad* 

1 ' all gone • • » 

1 al^ the way, 

1 all chose • • • 

1 all fall down. 

D. t>k2P0SITI0KS 

Prepositions are used in GF1 in- two waysi 

1 ) As a syntactic part of a verb associated with 

the preposition. Table 2 lists che sentence types 

requiring rules (3,8) auid (4,35), whii:n associate a vero 

with a preposition. It is important to realize that the 

seiuantir zunctloas associate! with tnese rules are not 

concerned with the denotations o£ ch^ prepositions, 

involved* For example, the lexical form 

persp V pronaij n prepiaav 

represents the utterances 

4 ' i ia-nped my puzzles out. 

1 i dump my puzzles out. 

1 i put my dishes away. 

The valuation of these is given by 

if [persp] c 

{ a I ( a<a.b> € LC0M3I.;e( [v] ,i:^rep)j ) 
(ye [pronadj] n L'O ) } 

.then Ti<UE else FALSE • 

The syntactic function COMBINE concatenates the verb witn 
the preposition to form, for example, tne separable verb 

in8 
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du(nped#out . 

This is then considered to b« the syntactic unit in the 
utterance. 

I might add tnat the function COMBINE does the saind 
work that would be acne by a transformation designed to 
convert the T:ree 




my puzzles 

i jumped 




W3 



I do not explicitly use transformatlousj however^ it mi.^hw 
be clearer to do so in this case. 

2) Two other rules ^ (7^1) and (o^^ii allow 
prepositional pnrases to modify noun phrases* (fhe reason 
for duplications of rules in the grammar GEl relates to tne 
fact that GEl is also a orooabilistic grammar* Often it i^ 
necessary to repeat the same process two or more times in a 
probabilistic grammar in order to account for statistical 
differences in tfte data.) 

The denotation for a preposition is: ^ 

[prep] c ^"2 
The rule that generates prepbaltlonal phrases Is 

(12,1) prspp -> prep np 
and the semantics Is ' ^ 
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[prepp] = LP rep np] ^ 

{a !( S<a>b> £ LPi^epj) 
(bGtnp])} . 

ttenca^ the noun phrase 

capitol of France 

has as its denotation 

[capitol] n ^ !j 

{ a 1 ( a<a,b>£ [of]) ' 
(b € [France] ) ; . 

AS previously mentioned > this is not the most natural way 

to handle prepositions. The preferable way is tu view ch 
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preposition as a function — e.g., 
CAPirOLK)f (x) • 

The preposition 'witn' is perhaps a paradigm tor .mj 
semantics for prepositions. In a quite fiatural way, 
[with] can be thought of as tha set of pairs <x,y> sucn 
that X is in the accompatiiment of y. Other 
prepositions, such as the ubiquitous *of ' , do not in 
themselves represent a single, clear semantical notion, and 
hence my treatment does not do such prepositious Justice. 

E. ADVERBS 

Adverbs form tne roost complex semantic ciasd I've 
considered. Here I am particularly afraxd that trym.^ to 
make GE1 a yood probabilistic grammar has hurt tnej 3?mantlc 
treatment. 

Two views of the semantics of tihe adverb a^^fc^ear 
rea sonable: 

1) The adverb is a function. Giver* a set A, 
ADVERB{a) c a, generally; for example, the adjecLival 
phrase 

[VBtY cjOOd] a VERYCtgood]) 
where VERY is the function associated in the model W witn 
the adverb 'very'. 

2' Alternatively, notice that most pro^jurties to 
which adveibs are applied can be thou^Jht of as or:ieringF.* 
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The adverb then lelects the appropriate section of the 
ordering, Aa an illustration, suppose that the ordering 
given by the adjective 'good* Isi 

ORDERING ON D GIVEN BY THE ADJECTIVE 'GOOD' 

V 

XI 

very x2 x"? , 
J. x4 

x5 x6 



xlO 

x11 x12 



\ 



The adverb 'very* then selects the appropriate part of* the 
ordering .'In question. 

I do not Intend to develop either theory In any 
detail, except to re-nar)c that 1) seems a oit too genaral to 
be useful In analyzing a child's language. 1) is a 
brute-foroe approach to the semantics of adverbs. 2) 
requires some analysis of the structure of some particular 
adjectives and adverbs in Erica's speecn, to see if it 13 
tenable or not. (Incidentally, I tnlnk that th« child 
thinks in terms of vary clean and simple ordarings on 
objects; I don't think that the analysis of the ordering 
given by an adjective, say 'good', would be as complicatea 
as might be suspected.) 

In the semantic functions I use the function 
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MEASURE ot three ar>jum8nt8 9 which aret ^ 

1 ) The first argument ie a dummy argument thac 
preserves some of the structure of the subtree Involved. . 
It does not currently play a part in the semantics. 

2) The adverb. 

3) The set the adverb is functioniu^j upon. 
Presumably, the concept represented by the sec would nave 
to provide an ordering. Hence » if 'pregnant' does not 
admit I to "more And leas'*, then 'very ^pregnant' la ^ 
meaningless. (From experience, I am however quite certain 
that 'pregnant' d6es admit to degrees.) 

Several rules— (4, 21 t4»22), (4,23), aau 

(4,38)--- introduce interrogatjive adverbs (such as 'wh«rt^', 

j \ 

'how') into the sentence. I now believe that these snouXd 
be hat^dled quite separately by a grammar wicn more 
individually suited rules. * 

F. OTHiiR WORDS 
Interrogative pronouns (words classea as 'inter') 
ask questions. The .meaning of a question Q, i shall say, 
is the sat S such that a description of S is the correct 
answer to Q. Interrogative pronouns have no denotation, 
but are instead 'logical' worafs. (See Cnapter b for a 
di^cu<9Sion of the rules tnat incroduce interroyatxv** 
pronouns. ) 
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Other logical words include 'conj' (conjunctioas) 
and 'nmq' (negatlng^wocds) • Inter Jactiona ('int*) play no 
awantlc role in my analysis, either denotatlvt^ or lo^jlcai. 
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CHAPTER 6 — THE SE.HANTICS Of EKICA 



I. THE SEMAi^riCS OF THE GRAMVtR GjSI 

In Cftapter 5 I discussed the basis danotations 
given to the lexical categories of words in the dictionary* 
These denotations weret of coursep selected with a^mind to 
the Kinds of semantic functions that would be assxt^ned to 
the productions of the grammar G£l • 

Here follows a disrussiou of the individual .rules 
of GE1# For each rule, I give the semantic fuhction, anu 
then report on the results of using the fule on the data. 
Lexical disambiguation was accomplished by the 
probabilistic model of lexical disambxguaciOii (see Cnaptpr 
4), In ^ome of the more interestiii^ ca3d3, i lisu the 
terminal forms involved > and some of the original 
utterances "^Tne format is cha following: first/ th^, 

label and the production are •^ivea, then tne loilowiag 
statistics about the usage ox the rule m the EkIJA 
corpus. 

V. (1) I haye tried to concentrate on tne problem.^ ^nd 
inadequacies of ^his semantics in this sectiOii. 

*^Spa^e d^ea not permit me to list all ^ the 
trtnsformatiotis of the daca that^I used in preparing tne 
suRinary given here^. since It runs several thousand pa^^es. 
However^ the lisbin^ft are available to anyone interested m 
this research in| a'/more detailed way. 
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^ 1) TYPES: the number or terminal form8\^ tnat yised 

the rule; 

2) TOKENS: the number of ori<jinal utterances that 
the TYPES r^jp^reeented; 

3) TIMEo USED: how^maay times tne ru^e wa^ usad in 
ERICA (where a given terminal form may hav4 used the rule 
more than once; thl? could either <have beer/ because 
(derivation of the form used the rule repeatedly or because 
there ar^e severafi derivations of the form^ eacn ^ or wnich 
used the rule) ; 

4) TINES USED * FREQUENCY t the frequency of a form 
multiplied oy the number of tiroes the form was useai summed 
aver the forms. 

If the complete list of terminal forms is tjiven £or 
a rule^ then t1|ie following information is included: 

1) oolumn 1: the frequency of the form in EkICA» 
after .exical disambiguation; 

2) colunn 2: the number of derivations of the form 
by GEl ; | 

3) column 3t the form» followed by the numbzir of 
times ^he rule was used for the form^ if ^nis number Is 
d iff extent from 1* 

Following this I the semantic function I used for 
the rule is displayed* The format is as described in 
Chapter 5* In addition to simple set-tneiordtlcdl 
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f unctlonsi the Special functions QUkATlt' and MEAbUKb are 
used witr> their special definitions assumed as ^iven xn 
Cnapter 5. Several ^^"Ortfer functions are also defined as 
needed* 

JVft^r lexical disambiguation by trie probabilistic 
-method, there were 1,060 terminal forms, representing, 7,046 
utterance tokens In EKICA« 



1. ADJECTIVE PHKASE RUi^ES 

(1.1). adjfi m : 
Ty^pes m 193 Tokens 9 539 

Times usod « 214 Times used * i?requency = ;j56 

^emant^.cs: [ad j] \^ 

An adjp, to characterize it iiiformaliy, a striao 

of common adjectives (adj) pracedeu by an cvcional 

adverbial phrase* 

Rule (1,1) the simplest of t-he rulas that 

Introduce such strl'n^js* 

ilxil 3li£ adjp adi - 

7 

Types m 39 Tokens « 63 ^ 

Times used • 56 Times used * irretjuency • 83 
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Semantics: [acljp] n tadj] 

rhia is the recursiva adjective yhrase rule. The 
forms using it are listed In Cnapter 4, so I do not repeat 



them here* 

/ 



(1.3) adip a^vg adip 



TERMINAL tORMS 



Typ«s No. of Fopn Times rule used on form 

Derivations (If different from 1) <^ 



7 


1 


persp link adv adj 


5 


1 


adv adJ 


4 


1 


pron link adv ad J 


3 


1 


adv adJ n 


3 


1 


persp link neg adv adJ 


2 . 


1 


link adv ad J 


2 


2 


persp link adv adv adj 




2 


adv adv adj n 




5 


adv adv adj adj 




1 


adv adj a prep pronadj n 




1 


con J pronadj adv adj 




1 


conj pron link adv adj 




1 


conj persp link adv aaj 




1 


int adv adj 




1 


n link adv adj n 




1 


neg adv adj 




1 


persp V adv adj n 




1 


persp link adv adj n 


\ 


1 


persp linK neg art adv adj n 


2 


persp link adv adv adj proa n 




1 


pron link neg ady adj 




2 


pron link adv ad^ adj 


Typaa 


« 22 


Tokens a 41 \l 
37 Times used * ir)requency = 58 

/ 


Timas 


used « 



3 
3 



3 
3 



Semantics J •'^SASURE(<adjp jADVP>i [advp] , [aajpj ) 

I 

rhls rule roodifies adjective phrases with adv^rfciaJ- 
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phrasas. Only one form has two (or more) adjectives 
together) 

1 adv adv ad J ad J 

The original utteranoe is 
1 in here any more 

which contains the adverbial phrases 'in#nere' and 
'any#more't which should be reclassified in che dictionary « 

The form 
1 adv adv adj n 

represents the sentence 

very very angry now • 
The word 'now' is very likely misclaased in cne dictionary* 

wnen two adverbs modify an adjective pnrasd^ chdr<* 
are two semantic interpretations possible, as snown by the 
i.ollowing denotatloiiS for 'adv aav ad J n'; 

1 ) MEASURE ( <ADiJP , Ai> VP ^ , lA*^ Vj » 

MEAsyRE(<ADJ?tAA)VP>, [ADVJ , lADJJ ) ) H L^J 

Thi« first interpretion is that botn adverbs raodity tn*» 
adjective in turn, 

2) MEASURE (<adJp,ADVP>, 

MEASURE (<adJp,ADVl^>, [Ai>Vj , [At>V] ) , ikDJ] ) 

n C:s] 

This second interpretation is that the first. advero 

:nodifles the second. 

Let roe elaborate a bit on this ambiguity. i'n«* 

ERIC ^on 
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\ 

intuition behind the function MEASURE la chat the advarb 
assumes an ordering on the modified set and then extracts a 
section from that ordering* The other notion of adverbs 
that I considered in Chapter and rejected, Is that the 
adverb Selects a subset of the modified set. (This second 
more general Interpretation seems too non-specific to be 
helpful in describing the semaatlcs of ERICA.) 

NO good exampljss of this ambiguity appear in ERICA 
to my knowledge* Some fictitious examples are the 
adjective phrases: 

a) somewhat overly protective / 

b) fairly well considered 

For a) -the correct order of modification is given oy 1), 
whereas for b) the correct order is 2).' Notice that we 
would, intuitively, group ^ 'overly protective' toyetner, 
then modify by 'somewhat' in a), whereas in o) cne teadency 
is to group 'fairly well' together. 

Oz course, some ways of handling the function 
MEASURE could yield semantic equivalence, but I thlnic that 
in the above example it is sufficiently clear to indicate 
that this is not always the case. 

The interpretation favored by the probabilistic 
grammar is 2)« The conditional r probabilities for che 
interpretations are: 

:i 1 0 
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1) .39 

2) «61 

All utteranoes In ERICA that have an adverbial 
phrase of two or more adverbs, thereafter modifying an 
adjective phrase, present this semantic amblsjulty* The 
original utterances, listed by the terminal forms Involvedp 
follow* (The line beginning ' (Prom: ' indicates the 

lexical form involved. Often, since \lexlcal disambiguation 

\ 

has occurred, some consolidation has occurred* See Cnapter 
4« Text oeginning with '(REMARK' coiicaliiS a oornmenc a^ut 
th^ previous yroup of utterances* 
(From: persp link adv adv ad J) 

i was very very scared* 

it*<? very v^tr/ i^naro, 

\Froa5t adv .^rjv adj ni 

very very angry now* 

(Fromi adv adv ad J adj) 
In here any -more, 

(From: persp link adv adv adj pron n) 

1 be vary very careful this morning, 

(Remark! 'this morning' Is not a predicate nominative as 

the grammar says '^t 13* ^galn, this is an adverbial jL^hrase 

that needs to be J^clasaifled in the dictionary.) 

(From: pron lin^ adv adv adj) 

those are very very high* 

Looking at these utterances involving two advernsj 
it is not clear which Interpretation la to be favored* Tf 
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we believe Che probabilistic grammar , we would cry to 
analyze 'very very' as an adverbial function, slnc«^ this 
Interpretation Is favored with a conditional probability or 
• 61 • One would like to see a greater variety of adverbs to 
make any claim, since 'very' la the only adverb using this 
construction In ERICA. See Section II for further 
discussion of ambiguity* 

2. AOVERBIAI^ PHRASE RULES 



iilxll advE z2 ads 



Types s 55 Tokens » 260 

Times used a 70 Times used * Frequency m ^77 



Semantics: [adv] 
Types m 8 Tokens « 29 

Times used ^10 Times used ♦ Frequency = 31 

Semantics: MEASURE (<ADVP,ADV>, [advp] , tadv/J ) 

Rule (14,2) Is the recursive auverblal phrase rule. 
The forms are given In chapter 4. 

/ 

/ 

3. QUANTIFIER-ARTICLfi RULES 



yA2 
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The synbol 'quart' introducdt quantifiers anU 
articles Into utterances. 

Notice that the class of 'qu' contains che cardinal 
numbers 9 and the function QUAiJTIr^ handles the seaiantice for 
these. A more syntacclcally elegant but seroanticaliy 
equivalent ^proach would use an added symool 'card' for 
the cardinal numbers^ making tne semantic difference 
explicit in the syntax* Tnls is to be preferred from a 
cor^eptual point of view^ since it makes a semantic 
distiiction clear in the syntax. The chief reason that I 
did not do this is that there appeared to be little 
difference in the way the various quantifiers were 
distributed statistically in the cort'US and hence no 
syntactic Justification for the added symbol. 

This may be a case of the syntax diverging a bit 
from the semantic^. I think that tne ERiCA corpus ozfers 
too little developlliental evidence to oe certain, rfe woula 
want to look I over a slightly loader period of time. (Erica ^ 
waa between 31 and 33 months old at the cime o£ the 
recordings. ) 

The semantics for rules (21,1) and (21,2) is simt^ly ^ 
the Identity fanctloa. This is becausa tne funocion 
QUIkNTIP, as described in Chapter 5, is called by che rules 
that actually introduce the 'quart' into utterances. Se« 
rules (22,2), (22,3), (17.4), and (17,5). 
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Types 3 117 Tokens = 277 

Times used a 140 Times used * Frequency > 307 



Semantics: [qu] 



UU21 quart zl £££ 
Types • 257 ToKens « 821 

Times used m 302 Times used * Frequency 882 



Semantics: [art] 



4. ADJECTIVE PHRASE RULES — POSSESSIVE ADJECTIVES 



n 

The symbol 'adp' introduces the symbol 'det' to 
precede strings of common a^^jectives (adjp). Tne symbol 
'det' then is replaced by either 'pronadj' (pronominal 
adjectives) or 'padj' (possessive adjectives). Ihe^e rules 
are not included among the aajp-rules sinc(^t a 
probabilistic grammar , GEl accounts £or the face tnat 
Dossessives usually precede common adjeccives* For 
example, notice the two utterfltaces representing the form 



(From: adv link pronadj adj n) 

1 here ia my big quilt. 

1 th«r« ia my n«w <n>. 
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(Rmarki the symbol '<n>' stands for unld«nti£i«ol« 
noun. ) 

I have not found in ERICA a single example o£ a possessive 
occurring after a common noun in a modifying phrase, GEi 
accounts for this? the price paid is the use of rules that 
have no apparent semantic content. 



Types a 62 Tokens « 157 

Times used » 68 Times used • frequency « 164 



Semantics: [adjp] 

llxll adfi dfit 



Types » 115 Tokens'- 297 

Times used » 139 Times used • irequeiicy » 327 



Semantics} [det] 



(9.3) adg :2 dfit adj^ 



TERMINAL JrORMS 



Types NO. of 
Derivations 



Form Times rule used on for-n 
(If different froiri 1 ) 



2 
2 
2 
1 
1 



adv link prbnadj adj n 
intadv aux pronadj ad J n 
persp v pronadj adj n 
mod persp v pronadj adj n 
neg pron link pronadj adj 
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Types 



persp linK pronadj adj n ^ 
pers£jnod ne*j v pronadj adj n 

prep art n adj n prep pronadj adj n 
:sp pronadj adj n 
pron pronadj adj n 

pronadj adj n 
pronadj adj n conj art n 
V pronadj adj n 
ToKens a 1 6 




Tiroes used a 13 Times used * rrequency 1o 
Semantical [det] ft [a^^Jp] 

5* RULES FOR ADJECTIVE- PHRASES NOT PRECEDING NOUN PaRASES 

Several rules Introduce adjective phrases that do 
not precede a noun phrase* These rules are: (T^^)^ (4,9)9 
(49I2), and (4,41). When an adjective phrase stands alone, 
the effect of a 'quart' (quantifier or article) mu3t be 
made on the adjective phrase alone* As an example, 
consider the form 

7 persp link qu adj 

representing 



4 he's all blacK* 

he' 8/ all green, 
it'^ all better, 
he ;id all better. 

The denotation for these is 



i£ [persp] c CUANTIF( [qu] , UdJ] ) then 
TRUE else FALSE. 
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Aa I m*ntlbn«d In Chaptar 5, this use ofl 'ail' Is noc 
really as a quantifier, taut rather an adje^s^^ve (possioly 
attributive). Since the uses of 'all* that havl this sense 
are connected with adjective phrases not precedi^a^ a noun, 
the semantics could be modified to handle it easily enough; 
for example. 

If [persp] c ALL(Cadjp]) then. 
TRUE else FALSE , 

V 

using a function ALL to compute the appropriate subset of 

\ 

[adjp]. I am not clear arout all the im4)lications that 
this sort of thing would have. 

The qadp-rules generate adjective phrases that do 

not precede nouns. 

(22.1) qadg zZ fi^i^ 
Types X 43 Tokens ■ 21 3 

Times use^d « 49 Times used * r^requency « 220 
Semantics! [adJp] 



TERMINAL JfORMS 

Types No. of Form Times rule used on form 

Derivations (If different from 1) 



\ 
\ 
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7 
6 
3 
2 



Typed « 3 
Times used 



persp llnK qu adj 
qu ad J 
art adJ 

persp link neg qu adJ 
art adJ adJ 
persp link art adJ 
persp link neg qu adJ adJ adj 
pron link art adj 
pron link art adj adj adj 
Tokens s 23 

9 Times used * Frequency « 23 



/ 



Semanticd: 



QUANriF( Lquari] , [adjp] ) 



Types No* of 

OerivatlOiis 



TERMINAL FORxMS 

Form Times rule used on forrl^ 
(li ditfarenc from 1) 



3 
2 
1 
1 



Types » 4 
Times used 



pron link art 
art 

link qu 
link pron qu 
TOicans a 7 

4 rim'es used ♦ rrequeacy 



= 7 



Semantics: QUAkTC( [quartj ) 

The QUARrc function is given by 

j 

QUARTC( [quart] ) a QUA^mIIF ( [quart] , IHMEli ) 
I list below, by te'rminal form, tne utterances usxn^ tnis 
function. 

3 pron link art 

that ' 8 a ... 
thera's a ... 
thla l<i a ... 



ERIC 



2[E 



/ 
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(Remarici These appear to be fragments*) 



2 
1 



art 

2 a« 

link qu 

18 this. 



(Remarlc: Lexical dlaaroblguatlon appears to nave fallea on 
'link qn'f since the %iorl 'this' is probably a pronoun 
rather than a quantifier. It Is ^ of course, classed in-the 
dictionary as both.) 



1 



link pron qu 
Is another one? 



(Remarks Another failure of lexlc&l dlsamolguatlon. ) 

Most of these utterances appear to be fragmentary » 
so there Is little to conclude about the value o^ the 
QUARTC function. 



TERMINAL FOKMS 



Typas No. of 
Derivations 



Form ' rimes rule used on form 
(I£ different from 1} 



ERIC 



10 

7 

6 

4 

2 

1 

1 



pronadj 

pron lin\ pad J 
pron llnK pronadj 
padj 

pirap llnK padJ 

neg pron link pronadj 

persp link pronadj 
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Types m 7 Tokens « 31 

Times used « 7 Times used * Frequency » 31 

Semkntics: [det] 

Notice in the ^above forms for (22 94) that cha 
symbol 'det' does not occur in any form; this is because ic 
is, of course 9 a non-*terminal symbol of ths grammar GE1# 
'det'^ introduces possessive adjectives CpadJ') aiid 
pronominal adjectives ('pronadj') into utterances through 
rules (lOpI) and (10,2). 

(22.5) aadp ^ dgi ad Id 

TERMINAL FOKMS 

Types No. of Form Times rule used on form 

Derivations (if different from 1) 



1 1 conj pronadj adv adj 

1 1 , pronadj 4dJ 

Types m 2 Tokens » 2 

Times used u 2 Times used * Frequency =« 2 



Semantics! [det] n L^djp] 

6, RULES INTRODUCING POSSESSIVES 

I 

The symbols 'pad J' and 'pronadj' are the pcssessxve 

\ 

adjectives 9 which are introduced through tne 'det' symbol* 
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Types 1 121 Tokens » 312 

Times used a 144 Times used * txrequency ~ 341 

Semantics: [ ronadjj 

(10,2) det P^^^ 
Types a 16 Tokens m 34 

Tlme^i used a 17 Times used ♦ Frequency = 3b 
Semantics: Cps«^J] 

7. NOUl>i- PHRASE RUi.ES 

" ^ .Several sets of rules Introduce noun phrases • The 
proliferation of symbols Is, agalp, to maKe a 
reasonable probabillatlc gram.T.ar. fnls pi:oia. c«ra^''oa % 
prima facia disturbing, especially since many of the xuie.3 
have little semantic content. Tne expianaclor; is tnat 
noun-pnrase constructions appear rather aljciererltly whea 
used m different parts of the utterance. In particular, 
noun-phrases that »tand as the whole utterance are rather 
unlike noun-phrases that serve as the objects or 
prepositions. See Chapter 4 for the paramecfejrs a^snctat. . 
with the rules of G£1 . 

iUJL SmS^ zl £Q ^ 
Types s 112 Tu)cens « 234 

Times used « 137 Times used ♦ ir'reqaeacy « 2b9 
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Samantlcss [pn] 



12,^1 nounp zl £ 



Types m 650 Tokens « 2590 

Tlnifts used « 1030 Times used * Frequency « 3^69 



Semantics: [n] 



(2^3) nounp vron 



Types m 29^ Tokens « 1239 

Times used m 385 Times used * Frequency » 1421 



Semantics; [pron] 



iXIjll Q£/=l npsub pytPP 



TERMINAL FORMS 



Types No. of Form Times rule used on form 

Derivations^ (If dlfrereot from 1) 



14 2 persp v prOii prep pron 

5 2 persp V persp prep art n 

4 ^2 mod persp v pron prep pxon 

4 2 V persp prep art n 

3 2 persp rood v pron prep pron 

3 2 persp mod v persp prep persp 

2 2 persp v n prep persp 

2 2 persp v persp prep pn 

2 2 persp v pron prep art n 

2 2 persp v art n prep persp 

2 2 persp v persp prep persp 

2 2 persp V n prep pronadj n 

2 2 P^^sp V adj n prep persp 

2 2 persp v pron prep^ pronadj n 

2 2 persp mod v art n prep persp 
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2 2 persp v persp prep pronadj n 

2 2 persp aux v n prep pronadj a 

2 2 p^rsp mod neg v art n prep pronadj n 

2 1 pron link qu n t^rep persp 

2 2V persp prep n 

2 1V persp pron prep pron 

2 2V pron prep persp , 



1 aff prep n prep persp 

2 aux n prep art n 

2 aux pron prep art n 

2 con J persD v pron prep pron 

2 conj art n prep persp v art n 

2 con J niod qu n prep n v neg persp 

2 Int persp v pron prep pron 

2 Intadv aux art n prep art n 

2 inter pron prep art n 

2 inter link pron prep pronadj n 

2 mod persp v pron prep n 

1 mod persp v persp prep n n 

1 mod persp v n prep art n n 

2 mod persp v pron prep art n 
2 mod persp v persp prep qu n 
2 mod persp v pronadj n prep n 
2 mod persp v persp prep art n 

2 mod persp v pronadj n prep pron 

1 mod persp v pron prep prop art n 

1 mod persp v prep pron^^dj n n prep art n 

1 n n V n n prep art n 

1 n n V prep pronadj n prep persp 

2 n pn aux v n prep art n 
2 n V art n prep persp 

2 n V pron prep persp 

2 n V persp prep persp 

2 n V pronadj n prep art n 

2 n V pronadj n prep persp 

2 neg persp mod neg v pron prep pron 

2 persp V art n prep n 

2 persp V n ^rep art n 

2 persp mod v pron prep n 

2 persp V art n prep pron 

1 persp link n prep art n 

2 persp V qu n prep persp 

1 persp V prep n prep pron 

2 persp mod neg v n prep n 
2 persp aux v n prep perap 
2 persp V pron prep padj n 

1 persp V prep persp prep n 

1 persp aux v prep n prep n 

1 persp V pron prep art pn n 

2 persp V pron prep qu n aux 



ERIC 
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I 2 persp iQOd v persp^ prmp pron 

1 2 persp mod v persp prep art n 

1 2 persp V art n prep art ad J n 

I 1 persp V pron prep art pron ti 

1 2 persp mod neg v n prep persp 

% 1 persp. V persp pron prep persp 

1 1 persp V pron prep pronadj n n 

1 1 persp mod v persp prep persp n 

1 2 persp V art ad J pron prep pron 

1 2 persp aux v qu pron prep persp 

t 1 persp .Ttod v prep pron prep pron 

% 2 persp mod v pronadj n prep persp 

1 2 persp mod v persp prep pronadj a 

1 2 parsp mod nag v ad J n prep persp 

1 1 persp mod v persp prep pronadj n n 

1 1 persp V prep art n adj n prep pronadj adj n 

i 2 pn n mod neg v pron prep pron 

1 1 pn V art n prep n n 

1 2 pn V pron prep persp 

1 1 pron link n prep art n 

1 1 pron link n prep persp 

1 1 pron link pron prep pron 

I 2 pronadj n v art n prep persp 
12 V art n prep pron 

II V art pron pron prep persp 
1 2 V n prep n 

1 2 V n prep parsp 

1 2 V persp prep persp 

1 3 V persp prep n prep art n n ^ 

12V pron prep art n n 

12V pronadj n prep pronadj n 

1 2 V qu n prep art n 

Types M 97 Tokens « 140 

Times used « 100 Times used ♦ frequency ^ 143 

Semantics I [npsub] n Lprepp] 



This rule lets a prepositional phrase modify a noun 
phrase • I have included the complete list or forms here to 
supplement the diecussion of semantic ambiguity in Section 

II below. ' \ 

Notice that many of these forms nave two 
derivations. The reason for this grammatical ambiguity 



2?A ' 
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that the prepositional phrase may alternatively be viewed 
as an object of the verb instead of as af moaif ier to th^ 
noun-phrase. 

(NOTE: Here the reader may note that rule (13,2) 
has been removel from the grammar. I nave retained tnis 
numbering so that I don't confuse the computer program that 
formats all the tables of this worK.) 

(13.31 ng zl gonj nosub 



Types No. of 
Derivations 



TERMINAL rOKHS 

Form Times rule used on form 
(If different from 1) 



n 



Types m 24 
Times used 



pron link n conj n 
adj adj n conj pron aux v art 
art n conj art n v prep arc n 
conj pn conj pn aux v prep n 
n n conj persp v pron 
persp v pn conj pn 
persp link art n conj n 
persp V n conj pronadj a 
persp V pronadj pn conj n 
persp V pron conj qu pron 
persp V prep art n conj art n 
persp conj periip mod v pron 
persp V art adj pron conj art n 
persp mod neg v art n n conj art n 
persp mod neg v pronadj n conj pronadj n 
pn conj pn mod neg v art n 
pn prep pn conj persp 
prep pn conj pn 
prep pronadj njconj n 
pron link pn conj pn 
pron link pn pn conj pn 
pron ink qu n conj art n 
pron/fl^nk art n conj art n 
pronjidj^n conj pronadj n prep p ^rsp v 
rokenp -9 is 
24 Timeiv used ♦ Frequency • 25 



ERLC 
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Semantics: ( [npsubj ) U ( [npsub] ) 

\ This rule conjoins noun pnrases cogether with 
conjunctions. I believe the correct function is union^ as 
in 

2 pron link n conj n 

representing 

that's moftiiDy and daddy, 
there* 8 mommy and daddy « 

Consider 

pronadj n conJ pronadj n prep persp v 
which has the dpnotatlon: 

if (([pronadj] n [n] ) u ([pronadj] Oin])) 0 
^ {a|(3 <a,b>e [prep]) (be [persp])} 
c [v] then TRUE else lALSE . 
The original utterance is 

my mommy and daddy 'fore it rain* 
It contains the phrase 'fore it rain' as an adverbial 
expression* Hence , the amalysls is incorrect in this case, 
and this is the only utterance represented by the iornu 

The use of the union functiOii seems appro^^riate for 
mo3t of the utterances requiring rule (I3,i)« Ina 'conj' 
Is almost always the wrd 'and'« 
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Types * 868 ToKans > 3518 

Times used ■ 1751 Times used * rte-iueiic/ 



Semantics: [npsub] 



(17.1) npsub ^ persp 



Types s 525 Tokens » 2291 

Times used « 692 Times used » Frequency 



Semantics: [persp] 



(17.2) npsub z2 nOMnjj 



Types » 559 Tokens m 2546 

Tintes uffed . 903 Times used ♦ Frequency 



Semantics? [nounp] 



il7.3) n psub zl aSfi BSaiixi 



Types » 1^8 Tokens » 468 

Times used » 220 Times used • Frequency 



Semantics: [adp] fl [nounp] 



(17.4) npSub zZ M^art nounp 
Types •« 288 TokeYis « 937 

Times used « 356 Times used * Frequency 

Semantics: QOANTIF{ [quart J , LnounpJ ) 
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Rulas (17,4) and (17,5) generate noun-i^hraaes 
modified by a 'quart'. 

(^17^5) npsub quart ad ip nouij^ 



rERMIi>lAL FORMS 



Types No« of 
Derivations 



t^orm 



Times rule used on form 
(If different from 1) 



12 
10 

9 
8 
6 
6 
3 
3 
3 
2 
2 
2 
2 
2 
2 
2 



f 
1 



n 



persp y art adj ^n 
pron Iflnk art adJ n 
p«rsp V art ad J proa 
art adj n 

persp link art adJ n 
pron/ link art adj pron 
conjf art adJ n 
qu adj n 
qu adj n n 

ne^^ pron link art adj 
persp V qu adj n 
persp link art adj pron 
persp link neg art adj n 
^ersp link art adj adj n 
pron link art adj adj n 
V art adj n 

adv link art adj adj prou 

art adj n n 

art adj pron 

art adj adj n 

art adj adj n v 

art adj adj pron 

art adj adj adj n 

art adj pron perap v 

conj art adj adj n 

conj persp v iTrt adj n 

conj adv link art adj n 

conj pron link art adj pn 

conj persp link art adj n n 

conj persp v art adj adj proa 

c6nj pron link art adj aJj pron 

int pron aux v art adj n 

int pron link art adj adj n 

intadv aux qu adi n 

intadv aux art aaj n \ 



ERLC 
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Typas m 71 

TliMS used 



Intadv pertp v art adj ii 

Inter llnK qu adj n 

mod persp v qu adj n 

rood parsp v art adj pron 

n llnK art adj n 

n link art adj adj n 

n mod V prep art adj n 

neg persp link art adj n 

persp art adj adj n 

persp V art adj n n 

persp mod v qu adj n 

persp rood v art adj n 

persp V art adj adj n 

persp mod v art adj pron 

persp mod neg v qu adj n 

persp V prep art adj pron 

persp llnic prep art adj n 

persp mod neg v art adj n n 

persp V art n prep art adj n 

persp link neg art adv adj n 

parsp.ir art adj pron prep pron 

persp V art adj pron conj art n 

pn link art adj adj n / 

prep art adj n 

pron art adj n 

pron art adj adj n 

pron link neg art adj n f 

pron link pron qu adj n 

pron link art adj pron art a 

qu adj adj n 

qu adj* n V n n 

qu adj n mod neg 

qu pron link art adj n 

V art adj pron 

V persp art adj pron 

V qu adj n 
Tokens « 129 

73 Times used * frequency « 131 



Semantics} 



QUANTIF( [quart] , ( [adjp] n ["O^^PJ ) ) 



8. VERii-PHKASE kULES 



rypea No. of 
Oerlvationa 
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TERMINAL FORMS 

Form TiiMa rul* used on form 
(If different from 1) 



402 

33 

29 

18 

1 6 

1 4 

13 

1 1 

10 

9 

7 

7 

7 

7 

6 

6 

6 

6 

6 

5 

5 

S 

4 

4 

4 

4 

4 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

3 

2 

2 

2 



perap mod neg v 

perap aux v 
persp mod v persp 
perap mod v 

perap mod v pron \ 

perap mod neg v perap 

perap mod neg v n 

perap aux v n 

perap mod v perap prep 

perep mod neg v pron 

perap aux v pron 

persp aux v art n 

perap mod v prep perap 

perap aux v prep art n 

perap aux neg v 

perap mod v art n 

perap aux v pronadj n 

perap mod nag v pronadj n 

perap mod neg v perap prep 

inter aux v prep 

perap aux v prep 

perap aux v perap 

perap mod v n 

perap mod v prep 

perap mod neg v art n 

perap aux v prep perap 

qu n aux v 

art n aux v prep 

con J perap aux v 

n aux V 

n perap mod v persp 
perap mod v qu n 
perap aux v prep n 
perap mod neg v prep 
perap mod neg v qu n 
perap mod v pron prep 
prep pron 
prep pronadj n 
pron prep pron 
persp prep persp 
V pronadj n prep 



V 
V 
V 
V 



persp mod 
persp axix 
pei sp mod 
persp mod 
persp mod neg 
art n aux v 
art n persp mod v 
con J persp mod neg v 



2 

2 



prep 



ERIC 
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n%q p^rsp mod neg v prep 

persp aux v pn 

persp aux v qu n 

parsp .aux n^tg v prep 

persp aux nog v pron 

perap mod v pronadj n 

parap aux nag v perap 

perap aux v prap qu n 

parap mod v perap art n 

parap aux v perap art n 

perap aux v prap arc n n 

parap mod v prap pronadj n 

perap mod neg v art n prep 

perap mod v art n prep perap 2: 

perap aux v n prep pronadj a 2 

perap mod neg v prep pronadj n 

perap mod neg v art n prep pronadj n 

pton aux V 

pron mod v prep \ 

pron aux v prep 

adj adj n mod neg v qu n 

adj adj n conj pron aux v arc n 

adj n mod negzv 

adj n perap mod neg v art n 

a^dv perap aux v 

aff perap mod v persp 

art n mod neo v 

art n rood neg v neg 

art a mod neg v prep 

conj art n mod v 

conj pn n aux v n 

conj perap mod v n i 
conj perap aux v n 

conj perap rood v pron -.^i*^ 

conj perap mod neg^ n 

conj pn mod v prep perap 

conj persp mod neg v neg 

conj pron mod neg v prep 

conj persp aux v prep pron 

conj perap mod v prep art n 

conj perap aux v )?rep adj a 

conj per^ mod v art n prep 

conj pn conj pn aux v prep n 

conj perap mod neg v persp t^rat^ 

conj perap perap mod v prep pronadj 

Int perap aux v 

Int pron aux v art adj n 

mt perao mod v persp prap 

Inter perap raodzv 

mod neg v pronadj n prep 



281 
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n aux V nag parsp 

n mod V 
n mod nag v 
n mod V pron 
n mod V art n 
n mod nag v n * 
n mod nag v prap 
n mod nag v art n 
n mod V parap prap 
n mod V prap art ad J i 
n n mod v prap pron 
n n mod nag v art n 
n parap nod v 
n parsp mod nag v 
n perap mod v prap 
n parsp mod v parap. prsp 
n parap aux v prap art n 
n pn aux v prap 
n pn mod nag v A 
n pn aux v n prap art n 
n pn mod nag v prap art^ 
nag pronadj n aux v 
nag parap aux nag v 
nag parap mod v parap 
nag parap mod nag v pron 
nag parap aux v prap t>arsp 
nag parap mod v prap pronadj n 
nag. parap mod nag v pron prap pron 
parap mod v n n 
parap aux v n n 
paran aux v Int 
parsp mod v adj n 
parsp aux v ad J n 
parap mod v prap n 
parap liod v art n n 
parap mod v parsp n 
parsp mod v prap pn 
parap mod nag v n n 
parap mod nag v int 
parap aux v art n n 
p^rsp mod v qu pron 
parsp mod v adJ pron 
parsp mod v qu adJ n 
parap mod v art adj n 
parap mod nag v n af f 
parap mod nag v adJ n 
parap aux v prap pron 
parsp mod v prap qu n 
parsp mod v art n prap 
parsp mod nag v n prap 



/ 
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1 P«r8p BOd n«g V pr«p n 

1 PM«P nod nag v art n n \ 

2 parap mod v pron prap a 2^ 
1 P^tmp mod V prap padj n ^ 

1 parap mod v art adj pron 

2 poop nod nag v n prap n 2 
2 9^^^P V n prap parap 2 
1 porsp nod nag v qu adJ n 

1 parap mod nag v prap pron 

1 porap mod v prap adj pron 

1 poop nod V qu pron art n 

1 porap aux v prap n prap n 

1 porap mod v pronadj n prap 

1 porap mod nag v pjrap parap 

1 porap nod nag y prap art n 
1 X porap aux nag v prap art n 

1 porsp nod nag v art adj n n 

2 parap mor v parsp prap pron 2 
2 poop nod V parap prop art n 2 
2 porsp nod nag v n prap parap 2 



parap mod nag v pronadj adj n 
parap mod nag v art pron pron 
parsp aux v prap parap padj n 
parap rood v parap prap parajp n 
parap conj parap mod v qu pron 
parap aux v qu pron prap persp 4 
parap mod v prap pron prap pron i 



1 pwmp mod nag v prap art pron n 

2 parap mod v pronadj n prap parap 2 
2 parsp mod v parsp prap pronadj n 2 
2 parap nod nag v adj n pr^p parsp i 



parap nod v parap prap pronadj n 
parap mod nag v art n n conj art n 
parap mod nag v pronadj n conj pronadj n 
pn aux V parap 
pn conj pn mod nag v art n 
pn nod nag v parap 
pn mod If prap psrsp 

pn n nod nag v pron prap pron d 

pron aux v n 

pron aux v pn 

pron nod V pron 

pron aux v pron 

pron mod v parap 

pron aux v art n 

pron aux v parsp 

pron mod nag v n 

pron mod nag v prap 

pron mod v pronadj n 

pron mod V prap parsp 
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1 1 pron p«r8p aux v pr«p 

1 1 pronadj n aux v pronadj n 

1 1 qu n mod nag v pron 

1 1 qu n food V prap pronadj n / 

Typat » 198 ToKana m 670 / 
Timaa uaad » 216 Tlmat uaad * Praquancy m 895 

Samantlcai AUXir CN( [aUxilp] » [vp] ) 

?CM la dafinad by\tha following! 
AUX]fCN( [auxilp] p [vp] ) m I 

XP auxllp doaa not contain (ayntactically) any 

maaba: T tha elasa ^nag\ 
THBM [auxilp] n [vp] 

BXiSS 

(D 3 U d"2 U 0) - ( [auxilp] n LvpJ ) , 
whara D le tha domain of tha modal ^ • 



Not lea balow tha aanantlcs o£ rula (1692)9 which 
affactlvaly ignoraa tha 'nag' in tha danotation [auxilp]. 

From tha viaw of samanticap soma of the rulaa that 
introduea tha nagating particle ('nag') ara awKwara^ 
(Rulaa ditcusaad hare ara (I692) and (19«2)«) T/.aaa rules 
introduea 'nag' at a point in tha ae'^-cance where tha 
eomplamantation function cannot be ueed on the aet co be 
Qoaplamantadi aince it 19 not available at tnat ^^olnt in 
tha generation* instead^ the affect of complementation is 
handled later by the apecial function AUXFCNi 

Syntacticallyt howavery these rules dascrib?^ the 
generation of tha utterances in quaation ver>' well. 
Allowing for the generation of 'nag' at the riL^jht 
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the utterance tfould necessitate a proliferation of rules. 
I stress that this is no problem in the semantics, only a 

slight slippage between the surface syntax and the 

I 

semantics. I have chosen to proliferate rules of the 

t 

grammar only whan it was either necessary froin'a femantlc 
view, or to improve the probabilistic fit. Adding rules to 
introduce 'neg' at the elegant point would not have been 
justified in either of these ways. 

Types ■ 256 Tokens » 897 

Times used > 284 Times used * Frequency « 951 

Senant^ffsi [vp] 

(16. tl auxiiQ z2 

^ Types ■ 243 Tokens « 703 
Times used m 267 Times used ♦ frequency » 73b 

Semantics} [auxilj 

TERMliXAL FORMS 

Types No. of irorm Times rul« asa.i i 

Derivations (If different froro ; 
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/ 

/ 

perap mod/ neg v 

mod neg /V 

p^mp ipbd neg 

per8p/4od neg v persp 

persjp mod neg v n 

mod neg persp 

pej^flp mod neg v pron 

neg perep aux neg 

persp aux neg 

\ax neg persp 

persp aux neg v 

persp mod neg v pronadj n 

persp mod neg v perap prep 

neg persp mod neg 

persp mod neg v art n 

n mod neg 

persp mod neg v prep 

perso mod neg v qu n 

persp mod neg v pronadj n prep 

pron lux neg 

aux neg pron 

conj persp mod ne^ v prep 

neg persp mod neg v prep 

persp aux neg v prep 

persp aux neg v pron 

persp aux neg v persp 

persp mod neg v art n prep 

per so mod neg v prep pronadj n 

pOiTsp mod nej v art n prep prunadj 

pron mod neg 

adj adj n mod neg v qu n 

adj H mod neg v 

adj n persp mod neg v art n 

art n mod neg 

art n mod neg v 

art n mod ne^ v neg 

art n mod neg v prep 

con J persp mod neg v n 

conJ persp mod neg v nec, 

con J pron mod neg v prep 

conj persp mod- neg • v persp Tprep ^ - 

int mod neg qu n v prep 

Intadv mod neg persp v n 

rood neg v int 

mod neg n pron 

mod neg persp n 

mod neg v pronadj a prep 

n mod neg v 

n mod neg v n 

n mod neg v prep 
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Types « 89 
Times used 



n mod neg v art u 
n n naod neg v art 
n persp mod neg v 
n pn mod neg v n 



n 



art n 



nea n 


mod 


neg 








neg persp 


aux 


aeg v 




neg persp 


mod 


neg v pron 




neg persp 


mod 


neg V pron prep pron 


2 


persp 


mod 


neg 


V 


n n 




persp 


mod 


neg 


V 


int 




oersD 


mod 


neg 


V 


n af f 




persp 


mod 


neg 


V 


adj n 




persp 


mod 


neg 


V 


n prep 




persp 


mod 


nea 


V 


prep n 




persp 


rood 


neg 


V 


art n n 




persp 


rood 


neg 


V 


n prep n 


2 


persp 


mod 


neg 


V 


qu adJ n 




persp 


mod 


neg 


V 


prep pron 




persp 


mod 


neg 


V 


prep persp 




persp 


mci 


neg 


V 


prep art n 




persp 


aux 


neg 


V 


prep art n 




persp 


mod 


neg 


V 


art adj n n 




persp 


mod 


neg 


V 


n prep persp 


2 


persp 


mod 


neg 


V 


pronadj adj n 




persp 


mod 


neg 


V 


art pron pron 




persp 


rood 


neg 


V 


prep art proa n 




persp 


.nod 


neg 


V 


adj n prep persp 


2 


perso 


mod 


neg 


V 


art n n conj art n 




persp 


mod 


neg 


V 


pronagj n conj pronadj 


n 



pn con j pn 
pn mod neg 



neg v art 



n 



pn rood neg v persp 
pn n mod neg v pron prep pron 
pron mod neg v n 
pron mod neg v prep 
pronadj n aux neg 
qu adj n mod neg 
qu n mod neg v pron 
Tokens « 631 
95 Times used ♦ Frequency a 633 



Semantics i [auxll] 

Notice that the semantics for tnis does not Include 
any effect of the negating particle# See the discussion 
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37 



f 
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following rul« (5,1) for an explanation, and also Section 
II. 



(i§»ii f^mx =1 



Types ■112 Tolcene « 328 

Tlmea used ■ 120 Tines used * Frequency » 337 



Semantics t 



[aux] 



Types No. of 
Derivations 



TERMINAL FORMS 

Form Times rule used on form 
(If different from 1) 



402 

29 

27 

24 

23 

19 

18 

18 

17 

1 6 

14 

13 

13 

10 

1 0 

10 

9 

9 

9 

6 

7 

6 

6 

6 



persp mod neg v 
persp mod v persp 
mod persp v persp 
adv persp mod 
mod neg v 
persp mod 
persp mod v 
persp mod neg 
mod persp v pron 
persp mod v pron 
persp mod neg v persp 
mod persp v prep persp 
persp mod neg v n 
mod neg persp 
mod persp v 
persp mod v persp prep 
aff persp mod 
mod persp v art n 
persp mod neg v pron 
mod V 

persp mod v prep persp 
persp V neg mod 
persp mod v art n 
persp mod neg v pronadj n 



ERIC 



/ 



6 


1 


pertp mod nog v parsp prep 


5 


2 


mod persp 


5 




mod persp v n 


5 




mod persp v prep 


5 




mod persp v prep n 


4 




mod persp v pronadj n 


4 




mod persp v pron prep pron 


4 




ne^ mod persp v persp 


4 




neg persp mod neg 


4 




persp mod v n 


4 




persp mod v prep 


4 




persp mod neg v art n 


3 




mod persp v qu n 


3 




mod persp v prep pron 


3 




rood persp v prep pronadj n 


J 




n mod neg 


3 




n persp rood v persp 


3 




persp mod y qu n 


3, 




persp mod neg v prep 


3 


1 


persp mod neg v qu n 


3 




persp mod v pron prep 


1 


1 


persp mod v prep pron 


3 


2 


persp mod v pron prep pron 


J 


*> 

«* 


persp mod v persp prep persp 


3 




persp mod neg v pronadj n prep 


2 


4 


art n persp mod v 


2 




conj persp mcx3 neg v prep 


2 




intadv mod persp v 


2 




inter mod persp v 


^ 


2 


mod pron 


2 




mod persp v n n 


2 




mod persp v adj n 


2 




mod persp v qu pron 


2 




mod persp v j^^rsp prep 


2 


« 


neg persp mod neg v prep 


2 




persp mod v pronadj n 




'I 


persp mod v persp art n 


2 




persp mod v prep pronadj n 


2 


1 


persp mod neg v art n prep 


2 


2 


persp mod v art n prep persp 


2 


1 


persp mod neg v prep pronadj n 


2 


2 


persp mod neg v art n prep pronadj 


2 


1 


pron mod neg 


2 


1 


pron mod v prep 


1 


1 


adj adj n mod neg v qu n 


1 


1 


adj n mod neg-^v 


1 


/ 1 


adj n persp mod neg^-v art n 


1 


1 


aff mod persp n 


1 


1 


aff persp mod v persp 


1 


1 


art n mod neg 
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art n mod neg v 

art n mod neg v neg 

art n mod nag v prep 

con J pron mod 

conj art n mod v 

ronj persp rood v n 

conj persp mod v pron 

conj persp mod neg v n 

conj pn mod v prep persp 

conj persp mod neg v neg . 

conj pron mod neg v prep 

conj persp mod v prep art n 

conj persp mod v art n prep 

conj persp mod neg v persp prep 

conj mod qu n prep n v neg persp 

conj persp persp mod v prep pronadj 

Int mod persp v 

Int mod persp v n 

int mod neg qu n v prep 

int persp mod v persp prep 

intadv mod neg persp v n 

inter persp mod v 

mod art n n 

mod art n v n 

mod n V a n 

mod n V persp 

mod neg v int 

mod neg n pron 

mod neg i^ersp n 

mod n V prep art n 

mod neg v pronadj n t^rep 

mod pn V n 

mod persp n 

moa pronadj n 

rood proa v prep 

mod pronadj n v 

mod persp v art n n 

mod persp v persp n 

mod pronadj n v pron 

mod persp v qu adj n 

mod pron v prep art n 

mod persp v qu pfon n 

mod persp v prep qu n 

mod persp v prep pn pn 

mod persp v prap'art n 

mod persp v pron prep n 

mod persp v prep art n a 

mod persp v art adj pron 

mod persp v pronadj adj n . 

mod persp v persp prep n n 
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1 mod persp v n prep art n n 

2 mod persp v pron prep art i 
2 mod persp^ v persp prep qu n ^ 
2 mod perhp v pronadj n prep n z 
2 mod persp v persp prep art n ^ 
2 Mod persp v pronadj n prep pron 2 
1 mod persp v pron prep pron art n 

mod persp v prep pronadj n n prep art n 
1 ' n mod V 

^ n mod neg v 

1 n mod V pron 

1 n mod V art n 

1 n mod neg v n 

1 n mod neg v prep 

1 n mod neg v art n 

\ n rood V persp prep 

1 n mod V prep art adj n 

1 n n mod v prep pron 

1 n n mod neg v art n 

1 n persp mod v 

% n persp mod neg v 

n persp mod v prep 

n persp mod v persp prep 

1 n pn mod neg v n 

^ n pn mod neg v prep art n 

2 neg mod persp 

neq mod persp v pron 
1 neg mod persp v prep pers^> n 

aeg n mod r eg v 
^ jieg persp mod v persp 

neg persp mod neg v pron 

1 neg persp mod v prep proaadj n 

2 neg persp mod ne^ v pron prep pron 2 



1 


persp 


mod 


V n n 


1 


persp 


V persp mod 


1 


,:>ersp 


mod 


V adJ n 




persp 


nod 


V prep n 


1 


persp 


mod 


V art n n 


u 


persp 


mod 


V p&rsp n 




per^p 


moci 


V prep pn 


'i 


persp 


mod 


ne9 V a a 


1 


persp 


mod 


neg v int: 


1 


persp 


mod 


V qu pron 


1 


persp 


mod 


V adJ pron 


« 


persp 


mod 


V qu adJ n 


1 


persp 


mod 


V art adj n 


1 


persp 


mod 


nej V n aff 


1 


persp 


mod 


neg v adj n 


1 


persp 


mod 


V prep qu n 


1 


persp 


mod 


V art n prep 
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2 
2 
2 



2 
2 
2 



persp rood nag v n prep 
persp mod neg v prep n 
)6r8p mod neg v art n n 
irsp mo4 v pron prep n 




rsp mod 
mod 
pereo mod 
persp, mod 
persp 
persp 
pfcrsp 
persp 
persp mod' 
persp mod 
persp rood 
persp mod 
persp mod 
persp mod neg 
persp mod neg 



V prep pad J n 

V art ad J pron 
neg v n prep n 
neg v qu adj n 
neg v prep pron 

V prep adj pron 

V qu pron art n 

V pronadj n prep 
neg v prep persp 

eg V prep art n 
eg V art adj n n 

V persp prep pron 

V persp prep art n 

V n prep persp 

V pronadj adj i) 



2 

2 



persp mod neg v art pron pron 
persp mod v persp prep persp n 
persp conj persp mod v qu pron 
persp mod v prep pron prep pron 
persp mod neg v prep art pron 
persp mod v pronadj n prep persp 
persp mod v persp prep pronadj n 
persp mod neg v adj n prep persp 
per^p mod v persp prep pronadj n n 
persp mod neg v art n n conj art n 
persp mod neg v pronadj n conj pronadj 
pn conj pn rood neg v arc n 
pn mod 
mod neg 

rood neg v pfi^rsp 
rood V prep p^rsp 

nag v pron prep pron 



2 
i 
2 



n 



Types m 220 
Times used ■ 



pn 
pn 
pn 

pn n mod 
pron rood 
pron rood 
pron rood 
pron rood 
pron mod 
pron rood 
pron mod 
qu adj n 
qu n rood 
qu n rood 
Tokens « 
242 



V prou 

V persp 
neg v n 
neg v prep 

V pronadj n 

V prep persp 
rood neg 

neg v pron 

V prep pronadj n 
1006 

Tiroes used * Frequency 
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Semantica: 



Types m 86 Toicens » 778 

Times used « 86 Tiroes used * Frequency a 778 



/ 



Semantics ; 



[verb] 



Types No* of 

Derivations 



TERMINJ^L FORMS 

Form Times rule used On ioaa 
(!£ different from 1) 



14 

5 
5 
5 
4 
3 
3 
3 
3 
2 
2 
2 
2 
2 
2 
2 
2 
2 



persp V prep 
inter aux v prep 
mod persp 
persp aux 
persp mod 
arc n aux 



V 
V 
V 
V 



pJfep 
prep 
prep 
prep 

con J persp v prep 

persp mod necj v prep 

ppon V prep 

aux persp v prep 

conj pron v prep 

conj persp mod neg v prep 

inter persp v prep 

n V prep 

neq persp. rood neg prep 

persp aux neg v prep 

pron mod v prep 

pron aux v prep 

adj adj n v prep 

art n v prep 

art n rood neg v prep 

conj pron mod neg v prep 

int mod neg qu n v prep 

mod pron v prep 

n mod neg v prep 

n n V prep 
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Types « 34 
Times used 



n persp v prep 
n persp mod v prep 
n pn aux v prep 
pn pn V prep 
pron mod neg v prep 
pron persp aux v prep 
qu n V prep 
V prap pronadj n prep 
Tokens a 79 
34 Times used ♦ Frequency 



79 



Semantics: [COMBINE ( Lverb] ^ PRE?) J 

COMBINE is a purely syncactic function, discussed 
in Chapter 5. It Joins a verb to its associated 
preposition prior to semantic analysis. Tnis is reasonable 
enough, as in some of the following utterances representea 
by 

1 4 persp V prep. ^ 

(From: persp v prep,adv) 

he corned out. 
he stand up. 
he wake up. 
i get up. 
i yet in. 
they climb up. 

(From; persp v,mod prep) 

3 1 want to 

he wants to. 

(Remark: Here it is incorrect to COMBIi^E the vero 'want' 
with the preposition 'to*.) 

(Fromi persp v prep) 

he looking for... 

it turn on. 

she talking about. 
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(From: 1 parsp v^raod prep,adv) 
no go out. 

rhia rule, (3,2), has a minor ^jroblem when used xt* 
conjunction with rule (11,2). Tnat difficulty iSx discussed 
below. \ 

\ 



Types a 228 Tokens = 768 

Times used = 230 Times used ♦ Frequency = 770 



Semantics: 



{a i (a<a,b> € fcverbj )( b ^ UpJ ) i 



\ 

Types No. of 
, Derivations 



(3a4) vg 2/, ^^^b n^ n£ 

TERMINAL FORMS 
Form 



Times rule used oa lorm 
(If differeiic from 1) 



15 
4 

2 
2 
2 
2 
2 
2 



persp V n n 
persp V persp n 
aux persp v qu n n 
mod persp v n n 
persp V art n n 
persp V art pron pron 
persp mod v p^^rsp art n 
persp aux v persp art n 
coaj persp v qu n n 
con J persp v pronadj n n 
mod n V n n 
mod persp v 
mod persp v 
mod persp v 
mod persp v 
mod persp v 
mod persp v 



art n n 
persp n 
qu pron n 
persp prep n n 
n prep art n n 
pron prep pron 



art n 
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n adj n V n pn 
n n V n n prep art n 
neg persp v persp pron 
pdrsp V pn n 
persp V qu n n 
persp mod v n n 
persp V neg n n 
persp V ad J 



persp aux v 
persp 
persp 
persp 



n n 
n n 

V art pron n 
n 



Types 3 56 
Times used 



V pron art 

V art n pron 
persp-»mod v art n n 
persp V art n art n 
persp V art adJ n n 
persp mod v persp n 
persp V pronadj n n 
persp mod neg v n n 
per^P aux v art n n 
persp rood neg v art n n 
persp V pronadj pron pron 
persp mod v qu pron arc n 
persp V pron prep art pn n 
persp mod ne^ v art ad J a n 
persp V pron prep art proii n 
persp V persp pron prep persp 
persp mod neg v art pron pron 
persp V pron prep pronadj n n 
persp mod v persp prep persp n 
persp mod v persp prep pronadj n n 
persp mod neg v art n n conj Arc n 
pn V art n n 

pn V persp pron 
pn V persp adJ n 
pn V artL n prep n n 
pn V persp pronadj adJ n 
pron V n n 
pron V art n n 
qu adJ n V n n 
Tokens » 79 
56 Times used • Frequency « 79 



Semantics: 



{ a I ( a <a>b,c^ € [verb] ) 
( b € [np2] A c € [npl] ) } 



(AS mentioned in Cnapter 5, the numoers follo^vcw 
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the 'np' symbols in the above denotation indicate tne oruet 
of the symbols in the utterance • If no numoera occur ,^ then 
the order the denotation is the same as che\ order in the 
string under examination. Recall cnat the vise of 
set-language in giving the semantics of a Bering is 
formal./ abbreviation since the formal notion xs tnat of 
a LiSP-^type expression of arguments and functions. See 
Chapter 5* ) 

Rule (3 4) handles a case of a verb phrase where 
the » first noun-phrase following the vdiro is tfie indirect 
object, a. i the second is the direct object of the ver^. 
Recall that vetrbs are a subset or 

U d"2 U D 

and chat the verb therefore, if it cake3 ootn dii acw and 
indirect objects, will have as elements ordered triples of 
the form 

<subject,direct-oi3ject, indirect-obJect> • 
Many of these utterances are incorrectly described 
Dy this semantic rule. Very frequeiitly, more subtle 
markings are needed in the dictionary to iadicace lx3W manjr 
objects the verbs may take. Many words (suca as 'apple', 
^alphabet') are classed only as nouns, while cr.ey -xr^ 
cli xrly used^ as adjectives in some utterances xnvulvoi 
here. 

2\7 
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The £ollowia9 ucteraac^s ara ail incorrectly 
laser Ibad by Che semantics, wnereoy 
liiSLpersp V n n J « 

If [perspj c { a |( a<a,o,c>€ Lv] ) 
(b € [a2] A c € UlJ ) } 
then t:<U£ else FAi^SE 



The utterances represented are: 
(From; parsp v n n) 

2 it ^joes ducic, ducic. 

he ^oes meow, meow* 
ne says ^noo, moo. 
i buy apple juice, 
it <^OBZ ding, ding. 



(From: persp v,mod n a) 

2 i want oranya juice. 

^ it qo ding, ding. 

i waat alphaoet cereal. 

(i;rom: persp v,aux n n) 

i have bubble *^um. 
sne has baoy lizards, 
we nave syrup pot. 
you nava coffee caKe. 



Howe^ ^r, other utcerancas are co^racc, as u 

4^ [persp V persp n] « 

It [persp] c { a |( a<a,b,c>> € Lv]) 

( D € L^*] A c € ik^ersp] ) ! . 
then PKUE else ckLSE 



242 



which represents 

(From: persp v perep n) 

he brings me toys. < 
1 ^3ve him crackers. 
1 put it b^CK. 

(Remark: There is clearly a diccio^iary error on tne wora 
• oaOc' . ) 

(irrora: per^p v,aux persp n) 

i do it Kitty* 
(Raraark: Here, tne order oi oujeccs xs i^^vartea.) 

Also, exanine 
2 persp aux v persp art n 

(From: per3p#aux, per3p#li»ik v persp arc n,%^) 
2 h^' s giviusj him a kxss. 

which ^re correctly arialyzea. 

Let ma scress cne following polnc. Wmle 1 av.^ 
found many cases that do not wor^ i** this semaaclc. .au 
conseqiiently a-n forctid to say tnat it uneda rewor^lr^j, t.i-t^ 
methods o^ lexical disamDi^uatiOii us^d are oxcen ci iJ^in jxj^ 
Impressive. Notice tne aoova utL.-9rai.c-i5 aarxviag troir: 
•peirsp#^ux,^Jer9p#linK v persp art n,v'. Tiiis lexicax iox.* 
.^epyesents £our alternatives. O-^ly 0*13 Oi wae^^ 
^iecognized oy Gfel ; it is the correct rej^raseatauion , ^ 
would ajree» l arr V^rsonally convinced tna^ more sabLl*.- 
dictionaries and ^^mmara caa soxva tnu pr'oolems o: 



\ 
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level /a 



disambiguation at a surface level la more caaa^ chan mi^nu 
have previously beea chougnt possxole. 

Negating words ('ne?') can occur xa coiijanctiou 
with rule (39 4). An example is 



1 % [perfip V neg n nj « 

if [parap] c{ a j ( a<a,o,c/ € ^ 

. (d"3 U U D) " [v]) 
( u € [cU] A c € Lai J ) } 
tnen TRUE else r'ALSE « 

The uttera;ice involved 13 

he has no bacR seat 
so the denotation is incorrect in tnis cas^: cni^ wor^ 
is h«^re a quantifiert and 'bacic#seat' saoulJ be a nouiu 



Types Wo* of 
Derivations 



eorm iimes rule useu oa lom 
(li. di4:^erdnc frs>m 1j 



2 
2 
2 



persp V prep art n n 

perap v prep pro4«adj n n 

persp aux v prep art n n 

aux persp v prep n u 

conj persp v prep a n 

moJ parsp v prep pa pn 

mod persp v prep art n n 

mod persp v prep proaad j n n pr^p l 

neg mod p^^rsp v prep persp 44 
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parsp V pr«p pr pn 
persp V prep pron art n 
p#rsp V prep arc n adj a 
perap v prep prona4J a pn 
perap aux v prep perap padj n 
perap mod neg v prep arc prvjn n 
perap v prep art u auj n prep proaadj aaj n 
pn V prep pn n 
ypea « 17 ToRens a 20 
rimea uaed m 17 Tlmea used * FrequeiiCy a 20 



Samantlcs: 



{ a i ( a <a,o,c^ € [varo] ) 
( b € L^ipJ A c € [prepp] ) i 



The prepositional phrase ('prepp') is che xiialrecc 
ot>j^ct of th^ verb, and tne nouu-phraae Is tne dxrect 
obJert# For example^ 

2§ [perap v prep art n nj s: 

If [perap] c { a | ( a<a,b,c> € [v] ) 

(b € [a] A 

c € { a I ( a<a,b> € [prep] ) 

(b €QUA.>iTIr ([arL] ,[**]))} )} 

then TRU^ eX<3e ttALSll 

represents the utterances 

(Froma persp v prep art n n) 

he get over che c^i^e recorder 

(Remark: Dictionary error: 

'tape# recorder' should be a noun) 

(From: persp v^cnod prap^adv art n n) 
he go In the bach tub 

(liamark: Dictionary error: 

'oath#cub' ahould be a noun) 
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Also, consider cne uctet'^ices representing 

2 perap v prep pronadj n n 

which are 

1 eat with my monmy hamDur^er* 
you sic on my suit p^uts. 

(Re.nark: *8ult#pant9^ should a noun) 

Most of Che applications of rule (3,5) seem to 

failures* 



(3,o) vi£b nfi ££§He 

TERMI.>*AL FORMS 

Types No. of torm fxro^s rule used on form 

Derivatioas {l^ dirferenc Lcom 1) 



14 2 persp v pron prep pron 

5 2 persp v per^p pr«P *Jcrt. n 

4 2 mod persp v pron prec> proa 

3 2 persp mod v pron prep pron 

3 2 persp mod v persp prep persp 

2 2 persp v n prep persp 

2 2 persp v persp prep pn 

2 2 pers*/ v prou prep art a 

2 2 P^rsp V art n prep persp 

2 2 persp V persp prep persp 

2 2 persp v n prep pronadj n 

2 2 persp ^ adj n prep persp 

2 2 persp v pron prep pronadj a 

2 2 persp mod v art n prep persp 

2 2 persp v persp prep pronaJj a 

2 2 persp aux v n prep pronadj ii 

2 2 persp -nod ne.j v ^rt n prep prottadj *; 



2 conj persp v pro*i prep proa 

2 int perap v proi* prep proa 

2 mod persp v pron prep n 

2 mod persp v pron prep art n 

2 mod persp v persp prep qu a 

2 mod p^rsp v pronaij n prep 4* 

2 mod pe^sp v persp prep ar^ a 
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Types 



2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
? 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 
2 

53 



Times used ^ 



rood parsp v pronadj n t^re^ proa 
ti pn aux V u pr^p arc a 
n V art a prep persp 
n V pron prep persp 
r& V persp prep persp 
n V pronadj n prep ^rt n 
n V pronaij n prep pera^ 
neg persp mod neg v pron ^rep proa 
pe«:3p V art n prep n 
persp V n prep art a 
persp mod v proa prep a 
persp V art n prep proa 
persp V qu n prep persp 
per^p mod nej v u prep u 
persp aux v n prep persp 
persp V proiA prep padj n 
persp V proii prep qu a aux 
persp mod v persp prep prow 
persp mcy\ V persp prep art n 
persp V art n prep arc adj n 
n^rsp mod neg v n prep persp 
persp V art adJ pron prap pro*, 
persp aux v qu proii pre^^ per^i> 
persp mod v pronadj n pret> persp 
pers^ mod v persp prep pronndj n 
persp mod ne.j v adj n prep persp 
pn n rood neg v pron prep pron 
pn V proii prep persp 
pronadj n v art n prep pars^^ 
Tokens = 69 
53 Tl'nes used * irrequency a 8:^ 



Semantics: 



{ a I ( a <a,b,c> € iWfiCb] ) 
{ b € [up] A c € [t^repp] ) 1 



Notice that ch^ forms usiu^ rule (3,6) are ^11 
gra:nmatlcally amolguous* The amoiv^ulcy Is wneuner or^ not 
the prepositional pnrase Is aui object or the vero or a 

modifier of the noun phrase precedia<j ic« S«<e x.r\^ 

I 

discussion of grammatical amolguity in Seci-ion ^ oeiow. 



ERLC 
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Types :io. of 

Derivations 



TEKMINAL FORMS 

r'orT) Times rale usea on £orm 
(ir dirierenc from 1) 



14 

12 

10 

6 

3 

3 

3 

3 

2 

2 

2 

2 



Types a 29 
Times used 



persp V persp prep 
persp V pronadj n prdp 
persp moo v persp prep 
persp mod ne^ v persp prep 
n V persp prep 
persp V art n prep 
persp ^ffdt V pron prep 
persp mod neg v pronadj n prep 
mod persp v persp prep 
persp V n prep 
persp V proa prep 
persp mod neg v art n prep 
art n v persp prep 
conj persp v art n prep 
con J per so v pronadj n prep 
conj persp rood v arc n prep 
conj persp rood neg v persp pre^ 
int pn V pronadj n t^rep 
int persp mod v persp prep 
n mod V persp prep 
n persp mod v persp prep 
n y n prep 
persp^ V qu n prep 
persp mod v arc n pret^ 
persp mod neg v n prep 
persp mod v pronadj n ^rep 
pn V n prep 
pn V persp prep 
pron V neg qu n prep 
lOKeiiS « 79 
29 Times used * Frequency at lii 



Semantics: { a j (a<a,o>> ^ 

[COMdlNEi [veri>] .t'REPjj ) 

Tne preposition is taKen uo ufi a pare 
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meaning of the verb, and hence » tne function COJWiNE is 

Consider tne utterance* represencaa oy 
14 ^pers? V persp prep 

(Fran\; 1^1 persp v persp prep^acv) 

3 i dum it out* 

i cover them up* 
i covered them up* 
i eac em up* 
i e^t him up# 
i ^et it out# 
i pusning it up* 
i take it out« 
you pull them up* 

(From: 2 persp v persp prep) 

he shave it off* 
i turn it on# 

(From: 1 persp v^aux persp prep,adv) 

1 do them up. 

The function associated wltn (3,ci) is apparently 
reasonable* 

(3*9) V2 zZ ygg^ 



TEKMINAL iTOKMS 



Types No. of Form ri.-nes rule used oa form 

Derivations (I^ different from 1) 



13 

12 

3 

8 

7 



mod persp v prep persp 
persp V prep pers^i 
persp V prep pronadj n 
pers V prep art n 
persp V prep pron 
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7 1 persp mod v prep per3p 

7 1 persp aux V prep arc a 

6 1 persp V prep qu n 

5 1 mod persp v prep n 

4 1 persp aux v prep t^ersp 

3 1 mod persp v prep pron 

3 1 movi persp v prep pronadj n 

3 1 persp aux v prep n 

3 1 persp mod v pre? pron 

3 1 persp aux v prep pronadj n 

2 1 persp V prep n 

2 1 persn v prep pn 

2 1 persp V prep padj a 

2 1 persp aux v prep qu n 

2 1 persp mod v prep proaadj n 

2 1 persp mod aeq v prep pronadj n 



1 1 adj n V prep qu n 

1 1 arc n v prep art n 

1 1 aru n conj arc n v prep arc n 

1 1 conj art n v prep prou 

1 1 conj pn V prep qu pron 

1 1 conj persp v prep art a 

1 1 coaj art n v prep persp 

1 1 conj pn mod v prep persp 

1 1 conj persp aux v prep prOii 

1 1 coiij persp mod v prep art n 

1 1 conj persp aux v prep adj n 

1 1 conj pn conj pn aux v prep n 

1 1 conj per55p persp mod v pirep prouauj n 

1 1 int persp V prep pronadj n 

1 1 mod n V prep arc a 

1 1 mod proii v prep aru n 

1 1 mod persp v prep qu n 

1 1 mod persp v prep art n 

1 1 n mod V prep arc adj n 

1 1 n n mod v prep proa 

1 1 n n V prep pronadj n prep persp 

1 1 n persp v prep n 

i 1 n persp aux v prep arc n 

1 1 n pn mod neg v prep art n 

1 1 n V prep qu n 

1 1 n V prep art n 

1 1 n V prep persp 

1 1 n V prep proiiadj 

1 1 neg n pn v prep pronadj n 

1 1 neg persp aux v prep persp 

1 1 neg persp :nod v prep pronadj n 

1 1 persp mod v prep n 

1 1 persp V prep adj n 

1 1 persp moa v prep pa 

ERIC 
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2bO 



I 1 persp aux v prep proa 

I 1 persp mod v prep n 

t 1 persp mod neg v prep a 

f 1 persp mod v prep padj n 

f 1 persp V prep n prep pron 

1 1 persp v/prep nrt adj pron 

1 1 persp mod neg v prep proa 

^ 1 persp V prep persp prep n 

1 1 persp mod v prep adJ pron 

1 1 oarsp aux v prep n prep a 

1 1 persp mod neg v prep persp 

1 1 persp mod nag v prep arc n 

1 ' 1 persp aux neg v prep art n 

i 1 persp V prep jirt n conj art n 

1 1 persp rviod V pjcep pron prep pron 

1 1 pn mod V prep! persp 

1 1 pron V prep pn 

1 1 pron pad J n v prep n 

1 1 pron V prep pronadj n 

1 1 proa mod v prep persp 

1 1 qu n V prep pronadj a 

1 1 qu n mod v prep prouadj n 

1 1 qu pron V prep n 

Types a 78 Tuicens » 162 

Times used * 78 Tinea used * irrequeacy « 



Se-nantics: { a | ( S <a, o, c> € [vero] ) 

(c € [t^repp] ) } 



This rule is intended to oe used when 'prwpp' xs 
indirect object to the vero, and che vert) is raissms^. 
Example: 



13 


mod 


persp V prep persp 


Vi^romj 


9 


rrrod persp v prep 


3 


let 


m3 tai)c to it* 


2 


let 


me listen to it# 




can 


i talk to it? 




can 


i talK to him? 




can 


i listen to it? 




may 


i talk to It? 
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251 



(Remark: rne aDove seem to reinforc^i my inters -etatloa* ) 

(Froi): 2 mod persp v,moci prep perap) 

can i go with him? 
let me 90 wlch you* 

(Remark: Here, the prepoaitlOiiii pnras^ xs adyerolalp so 

my semantics Is lacorrecc*) 



(From; 1 mod persp v^mod prep^adv persp) 

can i qo in it? 
(i^emark: A^aiu, an adverol.^I phrase*) 

(From; 1 moa#per3p v prep,adv persp) 

lemiTte talK in it* 

The tunctlon for (3,9) is only partly succc^rii>£ci * 
Notice^ however, that GE1 does correctly dlsamolMuac^e in 
the above utterances* 

Types = 57b XoKens 3 2497 
Times used s o42 Times used * frequency ^ 2'ou4 

Se:aanttcs; [v] 

liLi.il ^^^^ }L negi 

TEkMINAL t'Ok-^S 

Typ^s No* o£ k^otrt) Times rule used on lor.r. 

Derivations {IJ: diirerenc ltj't. " : 
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perap v aeq 
persp V neg mod 

art n mod ueq v nag 

conj n pu V neg 

conj pron v nei, 

conj persp mod neg v neg 

conj mod qu n prep n v neg ^ersp i 

n aux V neg persp 

persp V neg n 

persp V nej n n 

persp V neg adj a 

pn V neg 

pron V neg 

pron V neg qu n prep 
qu n V neg persp 
ToJcens x 27 

17 Timtis used ♦ Frequeacy « 



(d"3 U d"^ U D) " [v] 



Rule (11,2) does not work correctly wheii asrd wit i 
rule (3,2) • The only for-n usin*^ oocn rules (3,2) ana 
(11,2) is 

1 proii V ne-j qu n prep 

r epresentlng 

1 this Has not two cnlldre.! In. 

Apart from the face of the atrangeaesa of cnis utcera;4Ce, 
notice that tne semantics gives cms denotation; 



If [pron] c 

{a|(a<a,b> 6 LCOMi>l.^b((iD"j \JD^^, U a;) " ^vj ) ipre^; . I / 

(be QdAwTIr ( [q^l i L- J ) ) i 
then TRUF else rALSE ♦ 



6 

6 
2 



Types m 
Tiiies used 



Semantics: 




253 



Pais denotation fall9 to COMdiNE the pre^sxtion tn^ 
verb until after the danotatloii or cne vero has o«ien 
computed. A more reasonaole denotation is 

if [pron] c 

{a|(a <a,b> ^ ((D'^i U d"2 U D) ^ [C0M3INE( v,prep)J ) ) 
(b€ QUAOTIr(UaJ ,[n]))) 
then TRUE else r'ALSE • 

This iSp however 9 a relatively mxiior proolem to txK. 



(19.1 ) liniSfi ZjL -t-^nk 



Types a 1/b Tokens < 860 

Times used » 182 Times userl ♦ frequency s 86^) 



Semantics; 



[linkl 



Types No* of 
Derivations 



TERMI.ML FORMS 

torn rimes rule used on form 
(If different from 1) 



4 
3 
3 

2 
2 



persp link neg adj 
perap link ney art n 
perap link neg adv ad J 
persp link ne^j n 
persp link n^^7 qu adj 



ERIC 



2VA) 



/ 
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2 
2 
2 
2 



Types s 23 
T lines used 



persp llaK neg nrt adj a 
persp linJc neg jf>rep persp 
pron lln)c neg n 
pron link nevj art n 
conj pron link n|eg adj 
conj persp link neg adj 
link neg persp adj 
neg persp link neg adj 
neg pron link neg art, pi\ 
persp llnic a g n n 
persp link ne? adj adj adj 
persp link neu art adv adj n 
persp link ne<^ adj adj adj 
pn link neg adj 
pron link neg adj 
pron link neg aidv od j 
pron liniC ne<j qu pron 
pron linic necj art adj n 
Tokans « 30 

23 Times used ♦ frequency » 3o 



Semantics: 



[link] 



9. RULES tiOR NOUN-PHKASES THAT STAoD ^LCAE 
The non and nomi rules add i*ochinj to ch 
semantical understanding of EKIC^. Kacher, the^^ accoua*: 
for the observation that tne generauloii nou:i-jjhra3cjo 
that' stnnd alone seems to be different from tne generatio,. 
of noun-phrases tnat stan:i 4lLn predicat^^R. 



iliJLl !12!3 r2 .nr?3ub £re£e 



Types !^o« of 

Derivations 



tekmina.^ forms 

Form rimes rule used on torm 
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7 
6 
5 
4 
3 
3 
3 
2 
2 
2 
2 



n prap art n 

pron prep persp 

pron pr^p pron 

pron prep n 

adj n pre? persp 

n prep pn 

n prep persp 

art n ^x^p perjp 

n prep ti 

pn prep art n 

pron pre^ pn 

adj adj n prep art 



n 



Types ^ 
Titles u^sed 

Semantics; 



adv adj n prep pronaaj n 
con J pron pre? proa 
persp pre^ persp 
pn prep p|n con J ^ersp 
qu proa pbep pro;iadJ n 
Tokens - 45l 

17 Times lisad ♦ rrequeacy a 
[npsubj n [prepp] 



46 




Types No* of 

Derivation^' 



lom -> npsub con j ngsub 



rBKMlNAi^ rGKMv 

torm Ixmes rule used on £or<A 
(II differeiic rrom 1) 



6 

4 

3 
2 
2 
2 



/ 

ajcc r. conj art a 
n conj n 
n conj art n 
pron conj pron 
n conj persp 
pn conj pn 
pn conj art n 
adj n couj n 
conj n conj n 
n conj pn 
n conj pron 
neg art n conj art n 
neg pro;* conj pron 
persp conj pn 
oersp conj oersp 
pn coaj n 
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1 pn conj persp 

1 pn couj protiadj ii 

1 proiiacij adj n conj art n 

1 pronadj n conj proaadj n 

20 Toicans a 3o 

Times used m 20 Times used ♦ ^frequency 



1 
1 
1 
1 

Types 



i8 



Senantlcs: 



( Liipsub] ) U 7^ L'^ps^^j ) 



(7^4) nom 7> nomi 



Types » 117 Tokens » 1343 

Times used » 118 Txmes used ♦ fr-^^uency » 134^ 



Semantics : 



[nomi] 



fFKMlWAL rCKMS 



Types No* ot 
Derivations 



For-n Times rule used ori ^orm 
{li, dlctevBiit from 1) 



66 

10 

7 

6 

6 

6 

5 

4 

3 

2 

2 



adj ' 
pronadj 
adj adj adj 
adj adj 
neg adj 
qu adj 
adv adj 
p^dj 
art %dj 
art 

neg adj adj 

adj adj adj adj adj 

aij adj adj adj adj adj adj adj 

adv adv adj adj I 

art adj adj 

conj pronadj adv aaj 

Int adv adj 



25/ 



1 1 

1 1 
Types « 19 
Times used 



neg adv adj '/ 
pronadj ad J / 
ToKdiiS * 125 / 
23 Times us^ * FreiueiiCy 



12:^ 



Seitiantics: [qadp] 



(18^1 ) nomi ^} npsub 



Types a 117. Tokens = 1343 

Times used = i18 : Times used ♦ if're4U^i:vcy s 1344 



Se^anticsf^ 



Types ^ o7 ToKens a 264 

T4-<a'^ ' ^^ * Tiroes used ^ nejuc 



Semantics-. [**om1 J ["psubj 

10. RUi-.ES (SEiNiERArii^G SE.^^TEi^CEw 
Th'a a-'tules generate comt>lete Siac.e»;ces* 
0 vH -^f inc^*- Jections, con JunctioiiS, ^ xuc t p - 
serif^ncag together into one utceranc^^. are ac»'*or-;^i sn;^-^ 
th« s-rules» 



(4,1) a -^y noig 



TFRMINAL FOwMS 
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Types No. of 
Ottrlvatlons 



Form Tiin«8 ruid used on xornt 
{IL diri'ereiiC irom 1} 



553 
92 
90 
89 
66 
55 
43 
34 
30 
29 
18 
1 7 
17 
lb 
1 1 
1 1 
1 1 
1 0 
10 
8 
8 
8 
8 
7 
7 
7 
7 
7 
6 
6 

I ■ 

6 
6 
6 
6 
6 
5 
5 
5 
4 
4 
4 
4 
4 
4 



n 

art n 
n n 
pron 
adj 
adj n 
pn 

pronadj n 
qu pron 
qu n 
neg n 
perap 
pn n 

inc n 

aij adj n 
n n n 

pron art n 
iat n n 
pronad j 
nrt adj n 
art n n 
conj art n 
pn pn 

adj adj adj 
con j n 
n n n n n 
n- pn 

n prep art n 
adj adj 
adj pron 

art n conj art i* 

conj pn 

n conj n 

neg adj 

pron qu pron 

pron prep perap 

qu adj 

adv adj 

neg art n 

pron prep pron 

adj n n 

conj n n 

n conj art n 

neg n n 

padj 

pron prep n 
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1 

1 

I 

i 



4 1 qu n n 

3 1 adj n prap persp 

3 1 adj pn 

3 1 adv adj n 

3 1 art adj 

3 1 conj pron 

3 1 conj persp 

3 1 conj art adj n 

3 1 Int pn 

3 1 n int 

3 1 n n n n 

3 1 n persp 

3 1 n prep pn 

3 1 n prep persp 

3 1 nag proa 

3 1 persp n 

3 1 proa conj pron 

3 1 4U adj n 

3 1 qu adj n n 

3 1 qu pn 

3 1 qu pron qu pron 

3 1 qu pron qu pron qu pron 

2 1 "\ ^art 

2 IV art n prep persp 

2 1 A conj persp 

2 1 nnnnnnnnn 

2 1 nnnnnnnnnnnn 

2 1 n prep n 

2 1 ne^ adj adj 

2 1 neg qu n 

2 1 persp n persp 

2 1 pn conj pn 

2 1 pn conj art n 

2 1 pn pn n 

2 '1 pn prep art n 

2 'I pron qu n 

2 1 pron prep pn 

2 1 pronadj n pronadj u 

2 1 qu n pron , 

2 1 qu pron qu pron pron 



adj adj n n ! 
adj adj pron, 
adj adj adj n 
adj adj adj adj adj 
adj adj n prep art a 
adj adj adj adj adj adj adj adj 
adj n pn 
adj n int 
adj n adj n 
adj n conj n 
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1 1 ad J pron adj pron 

1 2 adv adv adj n 2 

1 5 adv adv adj adj 5 

1 1 adv adj n prep pronadj n 

1 1 aff n 

1 1 art ad adj 

1 1 art adj n n 

1 1 art adj proa 

1 1 art adj adj n 

1 1 art adj adj pron 

1 1 art adj adj adj n 

1 1 art n n n n 

1 1 conj qu n 

1 1 conj art n n 

1 1 conj n conj n 

1 1 conj pronadj n 

1 1 conj art adj adj n 

1 1 conj pron prep proa 

1 1 conj pronadj adv adj 

1 1 int adv adj 

1 1 int n pn 

1 1 int n adj n 

1 1 int n n n n 

1 1 Int pron 

1 1 mt pacsp 

1 1 int pronadj n 

1 1 int pron qu proa 

1 1 n adj n 

1 1 n con j pn 

1 1 n conj pron 

1 1 nnnnnn 

1 1 nnnnnnn 

1 1 nnnnnii nnnnnnn n 

1 1 nnnnnnnnnnnnnnnan 

1 1 n n n n persp nnniinnnnnnnnnn 

1 1 n n pn 

1 1 n padj n 

1 1 n pron 

1 1 n pronadj n n 

1 1 n qu n 

1 1 n qu pron 

1 1 neg adj n 

1 1 neg adv adj 

1 1 neg art n conj art n 

1 1 neg n pn 

1 1 neg pn 

1 1 neg pron conj pron 

1 1 padj n 

1 1 persp n n 

1 1 persp conj pn 

y 

ERIC ^ 
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Types 



73 



Times used 



persp adj pron 

persp Gonj persp 

persp prep persp 

persp art adj adJ n 

pn art n 

pn con J n 

pn conj persp 

pn conj pron ad J n 

pn ri pn n pn ii 

pn pn pn pn 

pn prep pn conj persp 

pron persp 

pron art pron 

pron art adj n 

pron art ad j aaj n 

pronadj adj 

pronadj pron 

pronadj n n n 

pronadj adj n 

pronadj adj n conj art n 

pronadj n coaj pronadj n 

qu adj adj n 

qu n qu n ' . . n n qu n 
qu pron prep pronadj n 
qu pron qu pron pron qu pron 
qu pron qu pron qu proii coaj 
qu pron qu proii qu pron qu n 
Tokens a 15^1 
178 Times usea * Frequency s Id^o 



Semantics: [nomj 

Out o£ 3,085 utterances In EnlCA, recall chat 7,0^b 
were recognised by GE1 • oz these, 1,b:>1 are noun-phrases 
that stand albne, as generated by tne rule (4,1 )• c^ecause 
of the Interest In this class, X nave Included aoove all 
the forms* 



14x11 5 ZZ ISSSS 



Typijs M 1 
Tinges used 



Tokens s 7 

1 flmes used ♦ frequency 
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Semantics: IMMED n [inter] 

The utteraaces using (4,2) are: 

4 what? 
2 what* 

(Remark: Presumably the utterance 'what*' shoalu oe a 
question* ) 

1 who? 

The 'inter' words are che interroyawive proaoaas* 
The denotation of an 'inter' is tne set oi chi.igs in D 
that could satisfy the word* ror example, [''hac] is cne 
set of inanimate objects, and l*'*^] is uie sec of animacv^ 
(perhaps sentleiit) oojects* Tne samaaLics Jtor ch-a ::v'^y 
says to intersect [inter] witn IMMEu* I chinK Lnls is 
reasonable approximation* 

1 z2 ^^bj vol 
rypes « 380 Tokens « 15:^8 

Times used » 424 Times used * t requeacy lo 7b 

Semantics: 

Ir ( [subij ) c ( tvdi] ) 
THEN TRUE Ei^SE shLSE 

(4,4) a ^ inter vol 



ERLC 



lERMIUAL if'ORMS 



Types No. oz Form Times rule usaJ on £orm 

Derivations (l£ different from 1) 



5 1 Inter aux v prep 

3 1 Inter v 

1 1 later v persp 
Types M 3 Tokens » 9 

Times used m 3 Times used * Frevjuericy s 9 

Semantics: [ifiter] n [vbl] n immed 

i!or example, 

5 9 [inter aux v prep] & 

[inter] n [aux] n 

[C0M3IWE([v] pi'REP)] n [iMMEDj 

represents 

(F^^om; 3 Incer^aux, lnC9r#lln< v^mod prep) 

3 wha^s going on? 

(From: 2 lncer#auxp inter#llx4K v preppadv) 

2 what's happening outside? 

(Remark: Here^ lexical disambiguation by GE1 nas c^iOdtia 
that 'outside' Is a preposition; It is more corr^cLly an 
advero* ) 



Kule (4^4) seems reasonably successful* 
(4t5l a 2Z g^q^ iink p ^repp 



2b4 



TERMINAL FOkMS 

Types No* of rortn Tlmas rule used on form 

Derlvatloas dicfera^^c from 1} 



4 
4 

4 

2 
2 



Types s 
Times used 



persp link prep art n 
proa linic prep pn 
pron lln)c prep persp 
persp link neg prep persp 
persp llaic prep pror^adj n 
art n link prep persp 
conj persp llnic prep art n 
int persp link prep persp 
n link prep persp 
persp link prep n 
persp link prep pron 
persp link prep art pron 
persp link prep arc adj n 
pn link prep art n 
pron link prep art n 
pron link prep pronadj n 
Tokens « 27 

lb Timeg used ♦ Prequ=*ncy » 



27 



Semantics: 

( [subj] ) c (AUXrC*?*( Liinkp] , Lir^rep^j ) ) 
THEN TRUE ELSE FALSE 

An interesting case involvin.^ che negaclnj parcici»? 
*neg' is: 



29 [persp link neg prap persp] s 
If [persp] c 

(AJXFCWC [linK neg] , 
{ a I ( a<a,D> € [prep] ) 
(d e [persp] ) } ) 
then TRUE else FALSE 

representing 
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(from: 2 P«r8p#aux.persp#llak ne^ prt^p persp) 

2 it's not for me. 

This is not implausible. 

(4^6) a zit Inter lln)cp 



Types 9 1 Tokens s 3 

Times used « 1 Times used * Frequency « 3 



Semantics: 



[later] 0 AJXrC*4 [linjcpj , IM/ii,0 ) 



(4>7) 3 mod 8uo4 



Types ^o. of 

Derivations 



Fom rimes rula as«id on iiorm 
(If Uif ferine fro.-n 1) 



5 
2 
1 
1 



2 
2 
2 
2 



Types an 4 
Times used 



mod persp 
mod proa 
mod pronadj n 
neg mod persp 
Tokens » 

4 Times used ♦ Freiuency 



Semantics: 

IF ( [subj] ) c ( [roodj ) 
THEN TRUE ELSE r'ALSr) 



Types No# of 
Derivations 



TERMII^AL FORMS 

Form Times rule as 5u oa zocm 
(Ii dirtereac from 1) 



ERLC 



266 



30 
18 
1 3 
9 

5 
3 
3 



Types = 
Times used 

Samantlcs: 



prep art n 
prep n 

prep pronadj n 
prep persp 
prep pn 
prep pron 
prep p'^dj li 
aff prep n prep persp 
int prep persp 
Int prep art a 
nej prep qu n 
neg prep persp 
neg prep padj n 
neg prep proaadj i\ 
prep aaj 
prep art Adj n 
preo pa yonj pn 
pre? prcinadj n conj n 
Tokens «/92 
18 TiTid^ used 



Frequency = 92 



[prefpp] 



TEKMiNAL FOKMo 

Types v^o* of i?v^r.r» i'imes rule useu oa tor^n 

Derivations (li- diiferdaw fi^om 1) 



4 1 liak persp adj 

1 1 lia)c proa 

1 , 1 lin< pron adj 

1 1 ltn< neg persp adj 

Types 3 4 Tokens s 

Times used = 4 rime*? used ♦ r'raquency = 7 



Samantics: 

It ( [subj] ) C ( AUArCN{ L-^i*i'^PJ i cq^-^HJ ' 
then TRUE else FALSE 



Consider, for example. 
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49 [link persp idj] » 

if LP^J^sp] c 

X (AJXrCN( [link] , Uadp] )) 

then rRUt£ else FALSE 

representing | 

(From: 4 link^aux persp adj) 

2 are they blue? 

are they gooi? 
is it warm? 



Notice that all the^e utt^r^uces are qutfdscions* 
Sxnce, by convention^ the ^^eaning ot a cjuescioh^ is n-s 
answer, the semantics ^^icr^ks qorrectly. 

One can <arkplain th#* appareiicly pu^.2iiuj ci in^ Laac 
the meaning of a question la its aws^a*: oy allowing Lhat 
Erica wilx under^t^ad tne stiucturw oi: her dat^ base ( che 
mcxlel II ) without necessarily Kiaowi^vvj axi uhe aetaiis ot 
that data oase* \ 

Or course^ questions are different from declarative 
statements in that they require a difiereiiL response ^.rorn 
the other partyClss), but this is no prooiem. 



> (4.10) a zl linkp sub J a£ 



TERMINAL FOkMS 



ERIC 



2od 



Types ' pf 

Derivations 



Form £im^B rule us^d on torm 
{It dliierenc crom 1) 



5 
2 
2 
2 
2 

Types « 5 
Tinies used 



linR pror arc n 
link persp n 
link proii persp 
linK persp art n 
li^'; pron pronadj n 
TOR i 13 

5 rimes usei * Frequerry 



13 



Semantics: i 

IF ( [aubj] ) c 1( AUXFCN( LiinKp] , luo] ) ) 
THEU TRUE ELSE FALSE 



The intended interpretation is tnat 'aubj' xs the 
subject, and that 'np' is a predicate nominative. ^^Oi^x^u 
that no utterance uses 'link neq' , wnich is ^ a t'ossioility 
in grammar G£i • 

S'^Llifcik pron art n] = 

i£ [pron] c AUXrCi^A [lime] ,QjMiMriF I [arcj , in] )) 
then IkUE else FAL;>& 
r epresents 

(From: 5 linjs.,aux qu^proa art u) 

is this a mom? 

is that a rat? 

is th^t a man? 

is this a dtddy? 

is that a pumpkin? 



This is a plausible iaterprocation iior t;;as^ 
utterances, which are all questioas. 
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(4,11 ) a -> subj llnjcp 



Types m 76 TOkeas » 342 

Times used > 7t> Times used * rrequancy a 342 



Semantics? 

xr ( [subj] ) c 

( AUXFCN( Lli"^p] f l^ip] ) ) 
THEN TRUE ELSE rALSE 

Herep 'subj' Is Igaln cne IncenJea suoject, and 
'np' the predicate nominative. 

Consider 

3§£persp linJc neg art nj = * 

If LpeJC^sp] c 

*UXFCN( [lliiK aeg] , QUAtHTIt ( arc] , L^J i ) 
tnen rnUE.else rALSi£ 

whlcn represents 

(From: 2 persp linK^aux ne«^ art n) 

1 he Is not a puppet. 

1 1 am not a bear* 

(From: 1 persp#aux,per8p#llnx neg art n) 

1 I'm not a girl. 

(4^121 a zl g^bj li^ciKv qadw 



Types M 39 ToKens » 1 3k 

Times used = '41 Times used ♦ Frequency « 13:? 



\ 
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ERIC 



S ikMntlcs: 

IF ( [aubj] ) c ( AUXFCN( LimM t UadpJ ) ) 
THEN TRUE ELSE r'ALSE 



The 'qadp' is a predicate adjective phrase In rule 

(4.12). 

\ 

A 

^ 4|i1 3^ a ^ auxllp sutoj vp 



Types » 64 Tokens s 181 

Tiroes used m 72 Times used * Frequency « 192 



Semantics; 

If ( [sub J] ) c ( AUXFCH( Lauxxlp] , tvp] ) ) 
THEN TRUE ELSE FALSE 

\ 

(4.14) a ^ aubj njj vbl 



Types » 43 Tokens = 55 

Tiroes used « 45 TliAes used * Frequency = ^7 



Seroantlcs: 

IF ( [subj] . ) c 

{ a I ( 3<a,b> e ^vbi] ) ( o e tnpj ) t 
THEN TRUE ELSE r'ALSE 



(4,15) a 22 s^b j IJ ^n k p ng 



TERMliJiAL JeORMS 

Types No* of ?orm rimes rule used on lorm 

Derivations (If different fro-n 1) 



1 9 1 pron link art n n 

6 1 pron link n n 
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3 
3 
2 
2 
2 



Types » 22 
Tlmas used 



persp llnK n n 
pron link art pn n 
pron llnic art n pron 
pron link pronadj n n 
pronadj n linK n n 
conj pron ilnk art n n 
conj pron link art pn n 
conj persp link art adj n n 
neg pron llnic pn n 
neg persp link arc pn n 
persp link n qu n 
persp link neg n n 
persp link art n n 
persp link adv adv adj pron n 
pron link pn n ^ 
pron link prop art n 
pron link pe/sp n conj 
pron link pn pn conj pn 
pron link pron qu adj a 
pron link art adj proii art n 
Tokens a 52 
23 Times used ♦ Frequeucy =: 5i 



Semantics: 

Is: ( [suiDj] ) 

(AUXirCn [linKpj , ( [up] n bit^j ) )i 
THEN TRUP ELSE l Ai.SE 
Th4 inteadel semantus is based ioi« cne aooum^wJ^ar. 

/ 

that tne two noun-^phr^ses are in apposition. Coasiaer tne 
utterances represented by 



/ 



19 pron liniC art n n 

some of which are ^ 
(From: 18 pron#aux,pron#link a^x. a n) 



4 

2 



there's a kicty cat. 
there's ^ t^pe recorder, 
that's a tea pot. 
tnat's a music cat. 



Notice that the apposition in werpreta^xoi 



l3 



ERLC 



•'^78 



272 



contradlctttd, although some comblnacloni anould be listed 
aa single words (svjch as 'Jclttyfcat', ' tape# recorder' . ) 
Moreover, 'there's' and 'that's' ana similar cieraonscrative 
phrases should be given a better classificatioa than 
'pron#aux,pron#link' . 

(4,16) a auxllp 31} bj nfi 

^ / 
TERMINALS 



1 1 aff mod persp n - 

1 1 mod art n n 

1 . . 1 r iQd nag n-.pron 

1 1 rood neg persp n 

1 1 rood persp n 

Types ae 5 Tokens s 5 

Tiroes used - 5 Time*? used ♦ Frequency 5 



Seroantica: 

Ii: ( [subj] ) c { A I ( 5[<a,D^ € 

AUXt^CN( [auxilp] , IMMED )) ( b e tap] ) } 
then TRUE else FALSE 

The intention is that these utterances are missing 
their main verbs. Consider 
1 mod art n n 

which represents 

maybe the milk man. 



273 



Her« it is plausible that the main verb is raxssing duc 
assumed as a part of the 'context'. It is quite possib^le 
that this semantics should have severnl contexcuc^l 
parameters, representing, say, oojects, properties, 
actions, under immediate consideratiou, I have used only 
the set INMED to Indicate the presence oi: a contextual 
parameter. The idea o£ extending cnis co several 
contextual parameters is straightforward. The 

implementation **^ay be rather involve^d and is beyoiid the 

\ 

\ I 

scope of thin woric. > 



(4t19) a z2 g^uxilp sub J 
Types :3 12 ToJce^is a J8 

rimes used « 14 Times used * Frequency a 40 



Semantics; 

IF ( [subjj ) c ( AUXFCiNU [auxiip] , imZu ) ) 
THEN TRUE ELSE FALSE 



TERMINAL FORMS \ 

Types No. of Form Times rule useu on rorm 

Derivations (!£ different from 1) 



229 

3 

3 

2 

2 

1 



V 

int V 
neg v 

V int 

V neg 

V aff 
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Types a 6 Tokens » 240 

Timers used = 6 Times used * tre^uency a 240 

Semantics: If IrtMED «= ^ [vero] 

then TRUE else FALSE 

In these utterances tne vero stands alone. For 
229 V 

the utterances are a simple vero. Exa!nfc)las: 
70 looklt. 

(Remark: Probably an l?nperativee ) 
7 0 know . 

(Remark: Short for 'i don't kno-»' , according to the 
contextSe } 
21 see. 

The function for (4,20) wor<s in many cases; 
'lookit' and 'luiow' are notable failurese 

Moreover, two utterances contain a aejacin«5^ 
particle: 

3 neg v 

2 V neg 

For these utterances, it seems reasonable tndt uh^^ nijgatxiiy 
particle affects the verbe Thib semantics views tm^se aa 
being paired-denotation utterances, viz.: 
[neg v] » <.FALSE, [v] > 

and hence the denotations given to tnase uccerauo.^ n - 
incorrect* 

'i81 
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TERMINAL FORMS 

Types No, of Form Times rule used on lorn 

Derivations (ic different irom 1) 



intadv mod persp v 
intadv au.x qu n v 

intadv aux persp v >^ 
intadv aux art n v \ 
intadv aux pronadj n v 
intadv mod neg persp v n 
Types « 6 Tokens s 1 

Times ueed a 6 Times used ♦ rrequency = 7 

/ 

Semantic/a: MEASUUE(<auxilp#VP,INTADO, ( iSJBJJ PI 

AUXr^CN( [auxlln] , [vpj )K Lint;iav] ) 



The functions given for the in terrOM^icive ^aaveri 
are not well thounrnc out. The utterances are jutis\lons 
inquiring into such matters as ''-nere' , 'wnea\ or ' now \ 
action took place, - \ 

Con?3ider 

2§ [intadv mod persp vj a 

MEASURE ( <auxilp#V*> , INTAJVy> , i ^Jor5p i ^ 
AJXi:CN( [mod] , [v]) ) , tipt^'^^^^J) 

representing 

(r'roro: 2 lntaiv#mod persp v,mod) 

2 where' d it go? 

(* '»mi^rK: 'wnere'd' is h<»r^ an ' 1 n^-ii/wau.f ' 
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The rula says: 

1) Compute AUXjf'C^>i( [did], l^o]) . Tnis ^xves us 
the set or all things znaz '^did yo*"* 

2) Intersect this with [it]* 

3} Now, compute the adverbial funccion MEAiUcsE on 
the arguments. 

I leave the structure oi: adverbs in <^enaral and 
interrogative adverb^; in particular as an unsolvea problem* 




TEivMlNAL FORMS 



Types 



I No, of 
Derivations 



Form 



Time a cu^a 'a^^*^6 i.Ji^H 
(If dlJCferent trom i) 



5 
4 
2 
2 
2 



incadv aux art n 

intadv aux ^roiiadj i* 

intadv aux n 

injtadv aux proii 

int^^idv aux proiiadj adj i* 

intadv aux qu n 

intt^iv aux persp 

intadv aux art pron 

intadv aux alj n 

iatadv aux art adj n 

intadv aux art n prep art o 



1 2 
Types 11 
Times used 



12 Times used * rrequency 2z 



Tokens s 21 



Semantics: 



MEASaRE( <auxilp#IMMED,INrAJV;» 



AUXr*Ct^( [auxil,^] 



dvj ) 



A few examples: 
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(From: 4 int^dvfauxt iatadvllink. arc n) 

1 Where's a arrow? 

1 where' ^ an arrow? 

1 where' s the lady? 

1 Where's the buttons? 

(From: 4 Inta div#auxp intaav#lin)c prorijuj n) 

1 where' s my toys? 

1 where' s my door? 

1 Where's his sacic? 

1 where' s my pillow? 



4^23), a ^ latad y 



TERMINAL FORMS 

Types No. of form Timas rule used on t^-r-i 

Derivations (1^: diiteienc irom :} 



J 1 Int^dv 

Type? « 1 roKens s 3 

Times used a 1 Times used * Frequency ^ J 



Semantics: MEASJKE(<IMMED,I.^rA4)V>, IMMEU, [mcadvj ) 



The utterances using (4,23) are; 

1 how • • • 

1 where? 
1 "why? 



Types a 31 Tokens » 251 

Times used = 40 Times U3«=id * cre^uency x Io5 



2 HA 



T 
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Seti.antlca: 

Ir' ( [dubj] ) c ( [verb] ) 
THEN TRUE ELSE IrALSS 



(4 ^ 25) a qdv? Buj] auxiip 



Types No, ot 

Derivations 



TERMx.^AL FORMS 

Form rirrte^i rule useU on foriti 
(li: ditf-areac fronr, 1) 



27 
24 
4 
1 
1 
1 

Types » 6 
Tlmas used x 



adv persp aux 
adv persp mod 
int adv persp aux 
adv art n aux 

aiv n aux 

conj adv persp aux 
ToKens = 58 

6 Times usei * rrequeiicy 



= 5B 



It { [subil ) c 
measure! <auxilp^ADVP> , 

AUXr CN ( [auxiip] , TM.r^ 
THEN TRUE EjuSE r'ALSE 



So^c^ utteraiiCes ualny C^^,?^ 

2 7 adv £)'?rsp aux 

(From: aiv persp iin)c,aux) 

1 0 there it is. 

7 there he -'.s, 

5 there tney are. 

1 here h(> .is. 

1 nere ic i3, 

1 here we are. 

1 here they are. 

1 there '^p are. 
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Thesa utterances represent a raiiure ot lexxC(\j. 
dlsamoiguatlon^ Here, the adverbs (all locauives) .nodi£y 
the llnJclng varus, but the grammar dlsamui^ua ues co the 
auxiliary. 



24 adv persp mod 

(From: 24 adv persp v,mod) 

7 here wa go. 

5 there you go. 

4 here 1 go* 

4 there we qo. 

1 here you go. 

1 there 1 go. 

1 there It go. 

1 tnere they go. 



Here the verb 13 an action verb, out uhe advero 
doesn't modify at all. Tne '>^ord3 'here' anu 'cher/ acc as 
Interjections In the utterances. 

4 int adv persp aux 

(From: 4 int adv persp link, aux) 

4 oh, there it is. 

Again, the verb is not an auxilidr>, ao lexxf^ai 
dlsar^^lgu^tion has failed. 



(4 .281 a g^^l auxil.) 



rypes « 17 Tokens « 82 

Tlme^ us»5d « 17 Times used ♦ Frequency - Us. 



/ 

I 
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Semantics: 

IF ( Lsubjl ) c A-'aFC.-.! [auxtlf 1 
THEN TRUE ELSE FALSE 



(4.29) a 22 advp 



Typss No. oz 

Derivations 



TERMINAL FOR.HS 
Form 



Times rule used ou tOiJ, 
(It differeiit rtom 1) 



67 


1 


1 3 


1 


3 


1 


2 


1 


1 


1 


1 


1 


Types 


« 6 


rimes 


used 







adv 

a^Jiv adv 
n0q adv 
iat a'3v 
adv adv aiv 
coKj adv ~^ . 
ToKttni = 87 

6 Times use'i * i?'reqiuancy - 8/ 



29 
18 

2 

2 
i 

1 
i 

1 

(From: 

3 
2 
2 
1 
1 




prep,aav adv) 



in here. 
In nera. 
tander there, 
in there, 
out yonder. 
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as such In the dictionary » flnce uhey seom a^uicicidncly 
unanalysable. Aiternatlvely^ 'here\ 'there\ and ^yona^^r' 
could be thought p£ as nouns dt^notin^ piactiS^ as oojecca ol 
the prepositions involvedrT 

iixiOl a zl inter syioj 

TEHMINAL FORMS 

Types No. of /form Ximes rule used on form 

Derivations / (If aitferent xrom 1) 

//' 

ihter pron 
/inter a 
inter qu n 
/ inter persp 
/ inter pronadj a 
2 inter pron prep arc n l 

Types m hi Tokens * 37 i — ^ 

Times us^ ^ 7 limes used * Frequency • ^ 

Semantics; [j^nter] fl tsubj] n iMKhD 

/ 

Some examples: ^ 

(From: 32 inter ^u^pron) 

17 what that? 

7 what this? — 

3 who that? 

3 who this? 

2 what those? 



(4 ^31 ) a ^ inter linicp suoj 



'^88 
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Types NO* ojl 

Derivations 



CO tin fxmea rux^ asfeci j*. j^^^ch 



1 91 

1 a 

b 
4 



1 

1 2 
Types a 10 
Times used 



incer linK pron 
inter link persp 
inter link qu n 
conj inter link pron 
inter IxnK pronadj n 
inter link qu pron 
int inter link pron 
inter link art >^ \ 
inter link qu adj n\^ 
inter link pron prap p^onadj a 2 
Tokens » 228 

11 Times used ♦ f requency « 22^ 



Semantics; 



[inter] 0 isxibj] H 
AUXirCNi [lin)cpj , [IMMED] ; 



r >t ^ i A ^ • 



3 b wnat's thisV 

8 what's those? 

5 who' 8 that? 

3 who' s thisP 

1 whac's chis? 

1 who' s t-noae? 

/ 



(^4, 32) a 22. intt^jc vul 



Types No. of 

Derivations 



TERMINAi- tOKMS 
Form 



i imes rule or 



1 

1 



inter persp v 

inter persp v prep 

iat^sr pron v 

later pexsp mod v 

1 4;tfti r'tiT ' t''- * 
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Types s 5 rokens » 10 

Times used » 5 Times useJ ♦ t^reiuericy = 10 



Semantics: [inter] * 0 . v ^ ^ r....-r. 

{ a ! ( a<a,b> e [vbi] )( o e in^] ) i 0 LAAc.D 



Soma utterances usiavj (4932): 

(FroH): 2 inter persp v^aux) 

1 what i have. 

1 what she have. 

(Remark: These do appear co ^^e frai^mentary , but instead or 
being main clauses si-nply missing a main vero, they seem to 
be subordinate clauses,) 

(4>33l a ^ adv^ subj vol 



Types * 7 Tokens = 26 

Times used = 7 Times used * frequency :s 



Semantics: 

IF ( [subj] ) C - 1 > 

THEx. TRUE ELSE FALSE 



Example: 

(From: 15 adv persp v) 

5 there he goes. 

2 here i coma. 

2 here he goes* 

2 there it goes. 

1 here she goes. 

1 there it fits. 

1 there l«e stands. 

1 wherever she goes. 



284 



11 tJll 5 Zl vol suLi ijree 



Typ«s No. oc 

Derivations 



Form 



fimes rule used on Ccri^ 
(It difterenc iruta i) 



34 

5 
4 
2 



Types a 
Times used = 



V persp prep 

V pron prep 

V art n prep 

V pronadj n prep 
int V persp prep 

mod neg v proi*adJ n prep 
neg v persp prep 

V n prep 

V persp n prep 

V pn prep 

V prep pronadj n prep 

V qu n preo 
roicena » 53 

12 rimes used ♦ rrequancy 



Semantics; 

tr 



[sublj ) [COMxilNEi [vol] ,Pr^FP)j 

IHEls' rRUc FALSE 



( From: 

7 
1 
2 
2 



Exai^ples: 

20 V persp prep^adv) 

turn it up. 
eat me up. 
pick it up. 
picic them up. 
eat it up. 
eat them up* 
put it away, 
taice It up. 
taxe it out. 
take him but. 



(4 ,371 5 vero subj np 
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Types 5 21 fokena :r 3B 

rimes used a 24 Times used ♦ Fre^jueiicy = 4i 



Semantics: 

IF ( [subj] ) c 

{ a I ( g<a,b> c [verbj ) ( b € L^pj ) i 

THEN TRUE ELSE FALSE 



The Intended interpretation is that the 'sudj' is a 
subject, and the 'np' is the direct object* oorae mixea 
results follow. 



(From; 9 v persp n) 

2 did you 9 mommy. 

2 thank you, mommy? 

1 orlng me curl, 

1 drink it, doggie. 

1 look It aow. 

1 twaKe me fishy. 

1 make me bubbles. 

(from: 4 v art n n) 

1 draw... a Kitty cat. 

1 see a tape recorder. 

1 see the bunny rabbits. 

1 tell the tape recorder* 



Several of tnesa are impetatlves, with the '^udj' 
an indirect ooject; several others snow nouns of dlrucu 
address. The results of usiag cnls rule appv?ar 
mixed. 



(4, 38) a ^ Intadv qub j vt^l 



\ 
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Types No# or 

Derivations 



iekminal forms 

Form rimes rule used on lotm 
(It different froiTi 1) 



13 
1 
1 
1 

Types a 
Times used 



Intadv persp v 
Intady art n v 
Intadv persp v perap 
Intadv persp v art adj n 
Tokens s 16 

4 ilmas used * Frequene*' 



s 16 



Semantics: 



MEASURE(<VBL,i;>iTADV^,LsuDj] 0 LVblJ t 
[intadv] ) 



(From; 

6 
4 
1 
1 
1 



Some examples: 
13 Intadv persp v.mod) 

where it qo? 
where tney yo? 
where 1 go? 
where you go? 
where he going? 



(4^391 a -2 ^^xilP V 



TERMINAL FORMS 



23 

e 
1 



Types = 3 
Times used 



mod neg v 
mod V 

mod ne-i v int 
Tokens - 32 

3 Times used * Frequency 



32 



Semantics: 

IF ( IMMED ) C AaXFC.^( lauxlip] 
THEN TRUE E^SE FALSE 
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rhe intendei incerpret^itlon is chat the utceranco 
l3 missing its subject. Some examples: 

(From: 22 v#neg ,moi#neg v) 

22 don't know. 

(From: 1 mod v) 

2 w^nn?i see • 

1 wanna see? 



(4,40) a -> adva l i^*HP 5Hj^J 



Types « 12 lOKens » 34 

ri^**3 1^ I'l'^e?? used * Fre^uenrv =~ i4 

Semantics* 

V , 3 a ^ i J ) £ 

MKAS J i-: ( <! MMEJ , ADVP> , 

AUXrCwUliiii^p] iIMMP:J) I [adv^J ) 
THEN r<lJE F,LSE FALSE 



Types a 3 To)cen3 • 12 

n-nes used ^ 3 Times used ♦ Fre.j|ue.icy - 12 



Semantics: 

li^ ( 7MMED ) C A'JXfCW( L^^^"-<PJ 
THEN TRUE FLS^ FAi»SP: 
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(4,42) a z> iii^I li^Kp adv. 



TERMi.^AL FORMS 



Types No, ot Form Times rule used on iorm 

Derivations (If dicf'jrent fron 1) 



9 1 inter link adv adv 

5 1 inter link adv 

Types 2 Tokens s 14 

Times used « 2 Times used * tre-iuency a 14 



Semantics: [inter] 0 

MEASaRE(<lin)cp,ADVP>, 

AUXFCi^(LllnK?] , IMMEu) , ladvp] ) 

Some utterances using (4,42); 

1 From: 9 inter#auxp incar#iUiA. piep,adv auv) 

5 what's in there? 

2 what's under there? 
1 what's in here? 

1 what's out tnere? 

(Remark: Dictionary proble.Tis. ) 

(From: 4 inter#aux, incer#link aav) 

3 who' 8 here? 

1 what's there? 



Types a 4 Tokens « 10 

Times used » 5 Timers u<9ed ♦ ?tBqM^v\cy ^ ' 
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Semantics: 

Ir ( [s^ibjj ) C AJXFCN( [auxiii-i , :^v^^j 
THEN TRUE ELSE t^LSE 



lliA^I 5 zl inter auxl lp np verb 



TERMINAL FOivMS 

Types No. ol Fortt Times rule us^d on corrn 

Derivations (If ditfereuc xrom 1) 



12 1 inter aux persp v 

10 1 inter aux pron v 

4 1 conj inter aux persp v 

2 1 inter mod persp v 

1 1 conj Inter aux pron v 

1 1 inter aux qu n v 

Types a 6 Tokens = 30 

Times used a 6 Times used * i?reviuency » 3u 



{ a 1 ( a <a,D.> e ^ 
( b € [np] ) ] n IMMED 

(4 .45). a iA£iS£' 



Types 3 11 roKens =* i2 

Times used a 11 Tines used ♦ r'r»^^ueiicy a J2 



Ir^ ( i>uoj] ) c \JX7Z:^{ [linkpj , iXMI 
THEN TRUE ELSE FALSE 



Some examoies: 
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(from: 7 persp lin)c#aux) 

4 1 am. 

1 it 13. 

1 we are. 

Here the necessity ol the contextual parameter 
IMMED Is clear: 'i am' Is (probably) not a declaratxon of 
existence, but rather assarts that 'i' nas somts property or 
another. Again, I feel that having several contextual 
parameters available will make a needea distinction here, 

11. PREPOSITIONAL Pn<<AiiE GErtEnATiON 

(12.1) 2£S2£ fiJLae li^ 

Types a 236 Tokens = 479 , 
Times used -319 Times used ♦ Frequency = &03| 

Semantics: { a i ( a<a,b> g Lr^rep] ) ( " e i^p] ) i 

12. SJ3JECTS OF SEMENCES. 
The suoj rules (generate subjects. No new ae.aaiitic 
content is contained in th-^se ruleg. 

Types » 823 Tokens » 3342 

Timds used « 883 Times used * frequency = ^^4.. 



Semantics: 
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(6, 2) subj 3> r\£ prepp 

TERMINAL FORMS 

Types NOt of Form ^imes rule usad on rorra 

Derivations (!£ different trom 1) 



4 


2 


V persp prep art n 


2 


2 


V persp prep n 


2 


2 


V proi^ prep persp 




2 


aux n prep art. n 




2 


aux pron prep art n 




2 


conj art n prep persp v arc n 




2 


conj -nod qu n prep n v ne^ parsp 




2 


Intadv aux art n prep art a 




2 


Inter pron prep art n 




2 


Inter link pron rep pronadj u 




1 


pronadj n conj pronadj n prap persp v 




2 


V art n ?rep pron 




2 


V n prep n 




2 


V n prep persp 






V persp prep persp 






V persp prep n prep arc n n 2 




2 


V pron prep art n n 




2 


V pronadj n prep pronadj a 




2 


V qu n prep art n 


Type* 


a 13 


^Tokens s 24 



Times used a 20 Times used * rrequeacy = 2^ 

Semantics: [np] 0 [prepp] 

Notice that all but one or the forms usl**^ lo,<: 
are grammatically ambiguous. Tais is o^scaase cna rule x 
not really necessaryy except for tne form 
1 pronadj n conj pronadj a prep persp v 

where no alternative derivation exists. Sai^ancicai ly 
there is no problem since the ambi^iuity do^s noc afreet r.h 
semantics. See Section 2 for discussxoa of aruox juit^ • 

Some examples of utterance^? usxa^j (t>>/): 
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(trom: 4 v persp prep ar»- n) 

2 put it Oil the mxcrophona. 

r thank you for a d^ddy. 

1 thank you for a dinner. 

The intended interpretation oz tne se"nantic3 tor 

(6,2) is that the prepositional phrase m^ities che noun 

phrase. Tnis is usually not the rase, so tne rule xa 

incorrect. 

^ I 13. UrTERANCE-GB.^EKAri.^C kULES 

rns symbol 's' 13 the SwSrt symbol of the '^ramradr 

GE1 . 

' (e,i) 3 ^ a 

Types = 836 Tokens = 5037 

Times used » 914 Times used * rrequeacy = b1 02 

Semantics: [a] 

(8.2 ) 3_-> a^ii int 
Types = 1 Tokens » 541 

Times used a 1 Times used ♦ Frequency ^ d-^I 

Semantics: TRUE 

The ori^^inal utcerancea tor rule (o,2i axe; 



532 ah huh. 
8 uh num. 



1 umrnm eeK. 

CXaarly, these phrasejs should oe reclasaed in the 
dictionary. 

Having a single rule In the grammar to account tor 

these costs nothing, but It aoesn' c prove anything eltner* 

Rule (8,2) simt'ly says that thase sentences ar« 

grammatical. — 

|a.4l a ^ naa * 



Types No* of 

Derlvat Ions 



PEKMINAL FOnMS 

, Form Txmea rule usea on rorm 
(li. alifere.^t from 1) 



18 
8 
6 
o 

4 
'4 
4 

3 
3 
3 
2 
2 
2 
2 
2 
2 
2 



neg n 

neg persp aux nag 
neg aJj 

neg pron link art n 
neg art n 

neg mod persp v persp 
neg h n 

neg persp mod neg 
neg adv 
neg pron 
neg v 

neg adj adj 

neg pron ^llnK n 

neg persp link n 

neg persp llnK art a 

neg pron link art adj n 

neg persp mod neg v prep 

neg qu n 

neg adj n 

neg adv adj 

neg art n conj art a 

neg mod persp 

neg mod persp v pron 

neg mod persp v prep persp n 

neg n v 



2^4 



Types = 60 
Times used 

o 



neg n pn 

neg n mod neg 

neg n pn v prep pronadj n 

aeg pn 

neg persp v n 
neg prep qu n 
neg pron link 
neg prep persp 
neg prep padj n 
neg persp v ron 
neg persp v parsp 
neg persp v art n 
neg persp v adj n 
neg persp link pn 
neg prep pronadj n 
neg persp link adj 
neg pron conj pron 
neg pron link pn n 
neg pronadj n aux v 
neg persp anx neg v 
neg persp n.od v persp 
neg pron link pronadj 
neg persp v persp pron 
neg persp linK neg adj 
neg persp linic art pn a / 
neg pron link neg art P^K / 
neg persp link art adj ^ 
neg persp mod negzv t^roa \ 
neg persp aux v prep persp 
neg pron link pronadj aaj n 
neg persp mod v prep pronadj n 
neg persp mod neg v pron prep pron 
neg v n 
neg v pron 
neg v persp prep 
Tokens 120 
62 Times used ♦ rraquency = 122 



Semantics: < FALSE , [aj > 

The serffantics for this rule is based on the 
assumption that the utterance is lirst a negating woru 
(expressing 9 "complete thoughc ), followea by a complete 
sentence • The sentence oiten explains or elaborates upon 
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the negating word. 

For example 9 ^he form 
o neg pron link art n 

represents tuie utterances 



2 no, that's a butterfly, 

no, that's a boy. 

no, that's a bear, 

no, tha-t's a clock, 

no, that's a oceah. 

Such utteranpes must, I believe, 

denotations in 'orders co sensiole. 



oe gxven 



^aired 



(8.5) s aff a 



Types No. of 
Derivations 



TERMIwAL FORMS 

Form Times rule used on form 
(Ii; different from 1) 



11 
9 

5 



aff persp v 
aff persp mod 
aff persp linK 
aff mod persp n 
aff n 

aff pron link 
aff persp link adj 
aff persp link art n 
aff persp mod v persp 
aff prep n prep persp 
Tokens ^ 32 
Tiroes used = 10 Times used ♦ rre^uency 



Types 



J2 



Semantics; < TRUE, [a] > 

Rule (8,5) and (8,6) which follows ^laye pair^ 



/ 



/ 
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denotations for their semantics, 
(8,5): 



(Prom: 9 



aff persp nod) 



yeSp you can, 
ok, 1 will, 
yes you will, 
yes, 1 can, , 
yes, he can, 
yes, it might, 
yes, sne would. 



Some utterances usin^ 



{Q.eY s ^ aff 



Types No, of 

Derivations 



TERMINAL FORMS 

Form Times rule used on turm 
different from 1) 



1 1 
1 1 

Types = 2 
Times used 



per*jp mod neg v n aff 
V aff 
Tokens a 2 

2 rimes used ♦ rrequeiicy 



Semantics: 



(8,7) s jaaa 



Types No, of 

Derivations 



TEKMINAL r^ORMS 

Form Times rule used on f^rm 
(If different from 1) 



364 1 
Types = 1 
n^nes used 



neg 

fotcens s 364 

1 Times used ♦ Frequency 



ERIC 



297 



Semanticss FALSE 

All 364 of the uses of rule (S,?) represent 

364 no. 



rype^ 3 1 Tokens m 358 

riiiief! used « 1 rimes used * Frequency s ib8 



Semantics: TRUE 

Utterances involved; 

92 uhuh* 
66 OK. 
59 uh. 
41 yeah« 
40 yes, 
1 3 yep« 

7 yeh. 

6 umm. 

2 urnmm* 
uhrnnm* 
uiihmmmm* 

The prollireratlon oi these woras is not 
particularly useful for semantics research* It is iiicely 
that th^ editor meant to Indicate aiiferent pronuacxatioas* 



(8.9) 8 ^ iilt 

/ 

Types a 1 Tokens •< 240 

Times used s 1 Times used * Frequency = 240 



4- 

298 

SemAntics: 0 

The semantics for an interjection is nere 
considered to be nothing — the empcy sec. Some exam^/ies 
follow: 

92 oh. 
44 ucnhum. 

(Remark: •umhum* is probably an affirmative word.) 
10 urn. 

(Remark: 'um' .is probably an affirmative also.) 
<^ hi* 

Types a 1 Tokens x 4 

Ti^es used ^ 1 Times used ♦ t requeacy = 4 

Semantics: 0 

These are prooably fragments, fne utterances U3in«^ 

(8p10) are: 

2 and... 
1 but . . • 

1 even*.. 



^8, 11 1 s zl ^ all 



Types 3 1 Tokens ^ 42 

Times used « 1 Timas used * Frequency a 42 
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Semantics: < TRUE, TRUE > 

The purpose of this rule was to capture ^wo 
affirmations In one utterance. The original utterances 
are: | 

41 uh uh. 

1 yeah. • .yeah. 

'uh uh* is clearly Just one word. 'yeah. • .yeah' could 

conceivably be two separate statements, Dut the concext 

rules this out. Hence, this rule tries to capture a 

distinction that simply isn't present in ERICA. 

{6,121 3 ^ ifvt int 
Types 3 1 JoKens « 59 

Times used » 1 Times used * frequency = b9 
Semantics: 0 

Acjain, these utterances are to nava no mt^anin*^. 
Some examples: 

32 urn hum. 

(Remark: probably an affirmative word.) 

10 oh, oh. 

3 urn m. 

2 oh, darnit. 
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(8.13) s ns3. 

I 

Types = 1 roKens « 5 

Times used » 1 Times used * Fre'4uency » 5 

S^eraantlcs: < FAi.SE, FALSE > 

i 

The semantics for (8, Id) Is . anocaer paired 

denotation. The utterances Involved are: 

4 no, no* ^ , 

1 nope, no* \ 

These are ttosw IDcely repetitions for e.iit'hasis rather tnan 
examples of paired denotations* 

Rules (8,1o) tnrou»^h (8,1 allow an inLt^xjex. ^ 
or conjunction to be added before/at ter utterances withou? 
changing the meaning. Noclce that caeae are aot recursive 
r f only one sach word can oe added* 

(8.1 6 1 s 2d gQQj 5 
Types * 88 Tolcens x 1 46 

Times used = 91 , Times used * Frev^uaiiCy » 14':^ 

\ 

Semantics: [a] 



18.171 s ^ a conj 



.SO 7 
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rypea « 2 Tokens s 2 

Times used a 2 rimes used * rreqaency:^ 2 



Semantics: [a] 



Types a 4o Tokens s 81 

Times used » 47 Times used ♦ Frequency » 82 



Semantics: . [aj 



^8 , 1 5 5 xnt 



Types a 8 Tokens = 1 3 

Times used m 8 Times used ♦ trejuency ~ 13 



Semantics: [a] 
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II. GRAMMA nCAL ^.^D SEMANriCAL AMdIGJix'Y 
Cnaptsr 4 contains an exteiisive discussion ot 
lexical and yraimnatical atnoivjuity In the E^ICA cor^as. 
That discussion contains the beginning ot a discussion ot 
the correctness of the disambiguation. tiowever , 
correctness of a Syntactical construction is a problem that 
really relates to the Intended semantics of the grammar. 
Hence, I have delayed the consideration of tiiac t^rooier- 

until thiq ti^ne. 

I shall consider only tne gratTimatical ?i,T;Di^uit\ 
remaining in the ERICA corpus after 3.-^xlca] d i sar^r.; 3^- < t-> 
with the probabilistic method. There is rel^t^vei.v . t,, 
aucn arauiguicy remaining, a.s si»owi» in lab*,- 1 

TAtiLE 1 

GRAMMA I'lChi- AMBIGUITY Iim E«ICA 
AFTER LEXICAL DISAMrilG JA IIO»>. 



NUMBER OF TREES TYPES TOKEiiS 

PER UITERANCE 



980 0313 
78 125 



3 ^ \ 

0 0 

1 1 



4 

5 



ERIC 



Hence, only 80 forms rapresencin.j 127 utterances i.avte ar 
grammatical ambiguity (uslnc,- tne probabilistic .uodul o 
lexical disambiguation, wnlch removes soma ^ra.mmcitic- 
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ambiguity) • 

I shall say that aa utterance #c in sample? i> is 
s«nantlca llv arobiguoug irf there are two denotations 
d1 , d2 for )c in some model II $ such chat 

d1 d2 
« It 

Clearly, a terminal form must be gramoi-^ tically amoiyuous ia 

order to be semantically ambiguous (siuc^ each production 

in the grammar concerned has only one associated semantical 

rule, and since the rules apply in a unique way to a o|iven 

tree). However, it is clearly possible to nave aii 

utterance that is grammatically arabivjuous but aot 

semantically ambiguous. An example in EHiCA coaceriib rule 

(6,2) subj np prepp 

(see Section 1). All but one cf th»a forms using i6,2j ac^ 

grammatically ambiguous. Nevertheless, it is ^asy to show 

that there is no semantical amoi<^uity generated. Tne lorm 

4 V persp prep art n ^ 

uses this rule; tne two ^rees involvevi are ^howii m iabi^i^ 
2. Both trees have the denotation: 

if ( Lper<?p] n 

{ a I ( a<a,b^ € [prep] ) 

(0 €QUANrit'( [art] , [n] ) ) i ) 
c [v] tnen TKUfc else FAi^SE 

:i 1 0 
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TREE for 

'v persp prep 
art n' 



Without (6,2) 



Tk6LE 2 




VERB 



SUBJ • 



NP 



With (6,2) 



NPSUB 



PERSP 



PREP 



PREP? 



TP 
I 



mpIf"^ 



VERB 



/ 



SUBJ: rule (6,2) 

/ 



V MP 

\ 

/ 



PREPP 




NPSUB 



PREP 



QUART nCl^ 



ART 



PERSP 



NPSUB 



QUART 



NOUI^T 
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Looking at tne ori9lnal 1 lacing o£ idxicai torms 
(before lexical disamuiguacion) we find lOi ^yp^^t 
representing 137 tokenSi th^t have eane jramraa ^.ical 
anthiguity. Pnis grammatical ambiguity is wracc^aule to four 
oasic causes in the «^rammar« These causes or ^raiamacxcal 
ambiguity ar^ discussed below, and summarized in Table 3* 

1 ) Prepositional phrase; Does a preposicionai 
phrase modify the noun phrase preceding it (see rule 
(13,1)) or 18 it an indirect object of the verb (see rule 
(3,6))? See Table 4 for the alternative semaacic crees for 
the form 

7 persp v^aux qu^proii prep qu^n^on* 

2) Rule (4,7): Tne 4 forms using (^>7) are all 
semantically ambiguous. For example i 

5 mod persp 

has the semantic trees shown in Table 5 . i'ne 
(syntactically unnecessary) duplication or dl^rivatians was 
originally due to my feeling that some of tne utterances 
involved might require referi^nce to a contexcu^l paraxv^w^r 
(IMMED)i and others might not require such con^^exc 
checking. As I have examined che many other ^^rooitiTio, 
present I*n the corpus, tnis one seems xrreievanc. X 
mention it only to show that the technique for jtvliu^ 
alternative semantics for a construction is uo destine 
separate rules with separate functions • 

ERIC 'U:^ 
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3) Kule (6|2): As mentioned aoovej roost or the 
utterances using (6,2) are grammaacaiiy amoi^uous. 
However, (6,2) does not create any seraancic amoiguity, 

4) Adverbial Piirasas: Two or more adveros to^etner 
cause a semantic ambiguity (see Rules (1,J) and (14,2)). 
Table 6 has the trees for 'proa qu,pron linlc,aux adv adv 
ad J* . 

This anoiguity is easy enougn to eliminate from GEl 
once one decides which interpretation to accept. I have 
allowed it to remain because it iiluscrates two viable 
alternative interpretations for adveroial pnrases. 

5) Rule (4,7) and (6,2) together: i^wo utteraucea 
introduce grammatical ambiguity by usln^ uocn ol cafct 
rules together. Ho other complex causes oz gra.nmdtical 
ambivjuity are to be found in EKICA. 

lABLS 3 

CAUSES OF GRAMMATICAL AMBIGUI i Y IN GrAMI^Ak Gel 

\ 

AMBIGUITY TYPES TOKEi.S 

PRE POSITIONAL PHRASES 
RULE (4,7) 
kJLE (6.2) 
ADVERBIAL PHRASES 
RULES (4,7), (6,2) 

TYPES ■ 103 TOKENS - 137 



63 
7 

19 
6 
2 



89 
17 
23 
6 
2 
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TABLE 4 

TREES FOR 'PERSP V,AJX QU,PROi» PREP QU.i-KOiS' 
(Dlsamblguabad as 'persp v pron prep pron'. 

The other alternative £orms riave no derivatloiis. ) 

S 



SUBJ 



NP 



peI 



SUBJ 



NPSUB 



PERSP 



VBL 



VP 



Prepositional phrase modifies 
noun phrase. 



VERB 



NP 



NPSUB 



NOUNP 



PRON 



PREPP 




NPSUB 



NOUNP 



VBL 




Prepositlona] phrase 
modifies verb. 
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TABLE 5 
TREES FOR 'MOO PERSP' 



AUXILP 



AUXIL 



gUBj Without {k,'j) 



NP 



MOD 



NPSUB 



MOD 



PERSP 



A : {k,7) 



SUBJ With (h,7) 



NP 



NPSUB 



PERSP 
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TABLE 6 

TREEb FOR 'OUfPRON LINK.AUX ADV ADV AOJ' 
(The only lexical alternative reco^niz*i oy GE1 

*pron link adv adv adj'.) 

S 



SUBJ 



NP 



NPSUB 



Nomp 



PRON 



LINKP 



LINK 



QADP 



ADJP 



ADVP 



ADV 




ADV 



SUEJ 



1 



NPSUB 



NOUN? 



•LINKP 



LINK 



QADP 



ADJP 




ADVP 




ADV 



ADVP ART 



PRON 



ADV 



III. PROBABILISTIC DlSAMfllGUoIION 



The major grammatical amolgulty occurring in GEl is 
the disposition of the prepositional phrase: la it an 
Indirect object, or does it modify a noun-phrase? The 
probacillstlc grammar obtained by using tha values from the 
probabilistic model of lexical disambiguation (aea Cnapter 
4) assigns a probability of .79 to the Indirect object, and 
.21 to the noun-phrase modifier rola. 

Examination of tne 89 utterances in tne iiatiay 
prior to lexical disambiguation yields the i following: 

1) Only 21 utterances are (strictly incerpreted' 
indirect oojects. Some examples are: 

1 1 losici it to her. 

1 he didn't buy any loaf for ni.Ti. 

1 i gonna share it wxth you. 

GEl predicts that we would ilnd 71 utcerances oi c.his 
class. 

2) A larger than expected 3 2 utterances nave th« 
prepositional phrase modifying tne noun. Some exan-, ie 
are: 

/ 

1 1 want one of those. 

1 snoopy dog don't have some ot that. 

(Remark: Most of these utterances n3'<:? c-ip:-t2 
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phrases lllce 'of these' t 'of tnat\ L.e.^ where che object 
of the preposition Is a 'pron\ GE1 nad predicted that we 
would find only 18 utterances of this Kind. } 

3) In addition, 36 utterances are aaverbial phrases 
modifying the verb in the utterances. Some examples are: 

1 can you see them in the hole? 

1 lemme nave one in the score. 

1 daddy put a fire on it. 

1 man fixed my toe on a bed. 

1 • i go way in the air. 

1 i can save them for my room. 

GE1 does not consider these adverbial uses of the 
prepositional phrase. 

in several of ^these utterances the prepositional 
phrases seem to be objects of the vero. £>lotic«; 
particularly 

1 daddy put a fire on it. 

1 i cai* save them for my room. 

I think it %s clear that tne structure of verws 

needs to be reconsidered here. Verbs should oe cldssed 

according to tne number of oujects expecce^A of them aad th^ 

rulet written to accounc for diftertint vero symbols. Tnis 

should also simplify the structure of interrogative 

adverbs. r*or example, suppose that che structure of th«. 

^rb 'go' is 

<subject, place> 

•il8 
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I.e., the 'place' is where the subject is ^oing to. Inen, 

we would have 

[where are you going?] » 

{ b I ( a<a,t3>€ [are cjoingj ; 
( a € [you] ) } 

This concludes my discussion oi the semantics o£ 

ERICA. 
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